bigdatagenomics / bdg-formats

Open source formats for scalable genomic processing systems using Avro. Apache 2 licensed.
Apache License 2.0
38 stars 36 forks source link

Lengthen "abbreviated" field names #126

Closed fnothaft closed 5 years ago

fnothaft commented 7 years ago

Pet peeve of mine. We've got a variety of fields like AlignmentRecord.qual and AlignmentRecord.origQual that are unnecessarily short.

heuermh commented 7 years ago

+1

Gasta88 commented 7 years ago

@fnothaft , Hi, I'm very new to this project and I would like to contribute on this issue.

Is the file to edit bdg-formats/src/main/resources/avro/bdg.avdl? Which nomenclature is preferred between the two?

heuermh commented 7 years ago

Is the file to edit bdg-formats/src/main/resources/avro/bdg.avdl?

Yes. Then you can build the java code with mvn install and the javadocs with mvn javadoc:javadoc.

Which nomenclature is preferred between the two?

qualquality, origQualoriginalQuality, oldCigaroriginalCigar, oldPositionoriginalPosition, mapqmappingQuality, etc.

This might also be a good opportunity to add better field-level documentation to AlignmentRecord, RecordGroup, and Fragment, making explicit the mapping between fields in Avro records and the SAM/BAM specifications. See e.g. the docs mapping to the VCF specification for Variant, BED and GFF3 specifications for Feature, etc.

Gasta88 commented 7 years ago

Thanks @heuermh. I've set up Maven 3.5 on my Windows 7 machine (running JDK 8u65).

I've created a project, edited the POM.xml file to add the dependency cited in the README.md. When I run the mvn install command I get an error of this sort:

H:\Documents\Apache-Maven\my-app>mvn install
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building my-app 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: https://repo.maven.apache.org/maven2/org/bdgenomics/bdg-formats/bdg
-formats/3.8.1/bdg-formats-3.8.1.pom
[WARNING] The POM for org.bdgenomics.bdg-formats:bdg-formats:jar:3.8.1 is missin
g, no dependency information available
Downloading: https://repo.maven.apache.org/maven2/org/bdgenomics/bdg-formats/bdg
-formats/3.8.1/bdg-formats-3.8.1.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2.953 s
[INFO] Finished at: 2017-05-19T12:02:04+01:00
[INFO] Final Memory: 9M/107M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project my-app: Could not resolve dependencies
 for project com.mycompany.app:my-app:jar:1.0-SNAPSHOT: Could not find artifact
org.bdgenomics.bdg-formats:bdg-formats:jar:3.8.1 in central (https://repo.maven.
apache.org/maven2) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyReso
lutionException

It seems that Maven can't find the bdg-format dependency from the main repository. Am I missing something?

heuermh commented 7 years ago

Sorry, I'm not quite sure what you're trying to do there.

To contribute a pull request to bdg-formats to address this issue:

Clone your fork of the bdg-formats repo

$ git clone https://github.com/Gasta88/bdg-formats
$ cd bdg-formats
$ git remote add upstream https://github.com/bigdatagenomics/bdg-formats.git

Create a new feature branch for this issue, edit bdg.avdl, and build using Maven

$ git checkout -b lengthen-field-names
$ emacs src/main/resources/avro/bdg.avdl
$ mvn install
$ mvn javadoc:javadoc

The generated javadocs are found at target/site/apidocs/index.html.

Commit, push to origin

$ git commit -m "Lengthen abbreviated field names.  Fixes #126" .
$ git push origin lengthen-field-names

Create new pull request on Github

Gasta88 commented 7 years ago

Thanks, this is much clearer. I have further questions to improve the javadoc.

1) Taking AlignmentRecord as an example, there is a mixture of documentation comments (denoted with /**blabla*/) and plain line comments (denoted with //blabla). Is it required to convert everything in documentation comments style?

2) Some fields are not obvious to me (for example in RecordGroup there are flowOrder, keySequence, sequencingCenter..). Have you got any reference that I can use or shall I prepare a list here?