disq-bio / disq

A library for manipulating bioinformatics sequencing formats in Apache Spark
MIT License
31 stars 11 forks source link

Select a Github repository name #1

Closed heuermh closed 6 years ago

heuermh commented 6 years ago

Regarding naming, in the meeting a couple of names were suggested:

@tomwhite would also like to put forward the following (in the Spark sequencing vein):

Re: Apache Spark Trademark Guidelines

Software products, whether commercial or open source, are not allowed to use “Spark” in their name, except in the form “powered by Apache Spark” or “for Apache Spark” when following these specific guidelines.

Re: Basic Name Search Considerations

heuermh commented 6 years ago

-0 to dist-bio, only because it is similar to my own project dsh-bio

Here is a rabbit hole I started going down looking for inspiration for names https://en.wikipedia.org/wiki/Distributed_computing https://en.wikipedia.org/wiki/Massively_parallel https://en.wikipedia.org/wiki/Embarrassingly_parallel https://en.wikipedia.org/wiki/Amdahl%27s_law#Parallelization https://en.wikipedia.org/wiki/Parallel_(geometry) https://en.wikipedia.org/wiki/Posidonius https://en.wikipedia.org/wiki/Parallel_postulate http://sites.math.rutgers.edu/~cherlin/History/Papers2000/eder.html

The word cloud from the ADAM docs is rather boring ADAM documentation word cloud

tomwhite commented 6 years ago

Nice word cloud!

I'm biased, but I like "squark" most at the moment.

magicDGS commented 6 years ago

I agree with @tomwhite - the Spark variants (squark/speeq) sounds good.

lbergelson commented 6 years ago

Squark has grown on me. 👍 to it.

heuermh commented 6 years ago

I think Squark is too close to Sqoop, which is a trademarked Apache project already in the Hadoop/Spark ecosystem.

https://sqoop.apache.org/

I also think it fails the Names derived from “Spark”, such as “sparkly”, are also not allowed. guideline.

And nothing about it says biology or medicine or genomics to me. Anyone have a favorite biologist? Perhaps this list may inspire.

I personally like the Consider using functional names. guideline. What is the one-line description for this project? Genomics at scale. Parallel genomics. Distributed genomics.

A library for manipulating bioinformatics sequencing formats in Apache Spark.

As a general rule, any code that does not have a Spark or Hadoop dependency, or does not have a "distributed" flavor belongs in htsjdk.

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark and Parquet.

GATK4 aims to bring together well-established tools from the GATK and Picard codebases under a streamlined framework, and to enable selected tools to be run in a massively parallel way on local clusters or in the cloud using Apache Spark.

lbergelson commented 6 years ago

Hmn. It is very close to spark, so you're probably right that it's a violation.

Some bad / terrible alternatives: frankenstein named after a famous biologist who also used spark rddr RDDer, but dropping the e is all the rage these days setter for when we eventually change to using datasets, has a good dog as a logo option panspermia what biology is more distributed than outer space spores? franklin another scientist who worked with spark clusterbam sharded bams... maybe not good to name ourselves after a banned weapon system though? borg highly parallel distributed software, different copyright issues

I'm drawing a blank on anything good. The functional names are fine, but they're very clunky.

magicDGS commented 6 years ago

Some weird suggestions playing with @heuermh's short/functional descriptions:

And even more weird, based on scattering letters on the words:

I haven't done an extensive search, so it might clash with other products in the wild.

P.S.: just for fun - I realized that my full-name initials fit for a project name - DGS (Distributed Genomics at Scale).

heuermh commented 6 years ago

Yeah, a lot of good words in https://en.wikipedia.org/wiki/Panspermia

If only Anaxagoras or Wickramasinghe were easier to spell. :)

tomwhite commented 6 years ago

A couple of new ones (playing on parallel, distributed, and sequencing):

magicDGS commented 6 years ago

disq is too similar to Disqus and there is a java project for queue/task executor (https://github.com/intelie/disq)

tomwhite commented 6 years ago

Names don't have to be unique, they just have to not risk confusion. (Search for "confusing similarity" on the Apache Trademarks page https://www.apache.org/foundation/marks/#principles.) Neither of the examples you cite are in the bio or genomics space, so there is little chance of confusion in a user's mind IMO.

cmnbroad commented 6 years ago

Some more:

bamblaster bamifold parnomics seqstorm splitomics

So far, I think squark is my favorite.

tomwhite commented 6 years ago

I'd like to compile a shortlist to vote on. Please nominate up to two names to add to the list. Here are mine:

lbergelson commented 6 years ago

Hmn. I have some new suggestions but I'm not sure they make the shortlist.

heuermh commented 6 years ago

Sorry, still in brainstorm mode

magicDGS commented 6 years ago

For the short list:

tomwhite commented 6 years ago

Great - if everyone who wants to add something to the shortlist can comment here in the next couple of days I'll put together a vote.

lbergelson commented 6 years ago

If we go with zapbam we could try to claim this punching lightning bolt logo for some added pow!

zap bam

https://github.com/arasatasaygin/openlogos/issues/4

tomwhite commented 6 years ago

I think we're bikeshedding this - let's call it disq and move on. I haven't heard any real objection to disq; it's simple and short - and neutral.

heuermh commented 6 years ago

+1 for disq.

We could get a dot bio domain and include the domain as part of the name to help distinguish from disq.us and related. For a logo, I can find someone to do up something like this:

disq-logo

Distributed disq throwing! Maybe with a double helix pattern on the discs.

lbergelson commented 6 years ago

I'm still a fan of zapbam. I'm happy to move forward though with either name.

cmnbroad commented 6 years ago

👍 for disq in the interest of moving on.

magicDGS commented 6 years ago

Let's not block this: 👍 to disq

tomwhite commented 6 years ago

Thanks for all the input! I've carried out the rename of the code here: https://github.com/tomwhite/disq.

I think we can rename this github org and repo, import the code, and complete the governance issues.

heuermh commented 6 years ago

Thanks! Closing this as resolved, let's try to resolve the organization #2 and namespace #7 issues next.