GMOD / Apollo

Genome annotation editor with a Java Server backend and a Javascript client that runs in a web browser as a JBrowse plugin.
http://genomearchitect.readthedocs.io/
Other
128 stars 85 forks source link

implement GFF3 in python-apollo #2410

Open nathandunn opened 4 years ago

nathandunn commented 4 years ago

Implement like https://github.com/galaxy-genome-annotation/python-apollo/pull/25

===

nathandunn commented 4 years ago

Current features:

GetOptions("input|i=s"          => \$input_file,
           "username|u=s"       => \$username,
           "password|p=s"       => \$password,
           "url|U=s"            => \$url,
           "gene_types_in|g=s"      => \$gene_types_in,
           "pseudogene_type_in|n=s" => \$pseudogene_types_in,
           "transcript_types_in|t=s"    => \$transcript_types_in,
           "exon_types_in|e=s"      => \$exon_types_in,
           "organism|o=s"      => \$organism,
           "cds_types_in|d=s"       => \$cds_types_in,
           "ontology|O=s"       => \$ontology,
           "gene_type_out|G=s"      => \$gene_type_out,
           "pseudogene_type_out|N=s" => \$pseudogene_type_out,
           "mrna_type_out|M=s"    => \$mrna_type_out,
           "transcript_type_out|T=s" => \$transcript_type_out,
           "exon_type_out|E=s"      => \$exon_type_out,
           "cds_type_out|D=s"       => \$cds_type_out,
           "property_ontolgy|R=s"   => \$property_ontology,
           "comment_type_out|C=s"   => \$comment_type_out,
           "property_type_out|S=s"  => \$property_type_out,
           "track_prefix|P=s"       => \$annotation_track_prefix,
           "disable_cds_recalculation|X"   => \$disable_cds_recalculation,
           "success_log|l=s"        => \$success_log_file,
           "error_log|L=s"      => \$error_log_file,
           "skip|s=s"           => \$skip_file,
           "test|x"           => \$test,
           "help|h"         => \$help,
           "name_attributes=s"   => \$name_attributes,
           "use_name_for_feature|a" => \$use_name_for_feature);
nathandunn commented 4 years ago

https://python-apollo.readthedocs.io/en/latest/commands/annotations.html#load-gff3-command

nathandunn commented 4 years ago

Will just need to do a few of the options:

 —test (easy enough), and disable_cds_recalculation and —use_name_for_feature
lol
nathandunn commented 4 years ago

Looking more closely, I think the differences between the python and original perl scripts are pretty different, though they share some commonalities (lookoing here: https://github.com/galaxy-genome-annotation/python-apollo/pull/25)

I think it would make more sense to do this with a clean slate for a newer apollo developed around the OGS calculations.

The fundamental difference is the perl file accumulates the JSON and then sends (or not) the accumulated features all at once, processing features, transcripts, and variants separately. The python script is more focused on specific use-cases and only works at the features level (which is probably sufficient), doing writes to adjust the names, attributes, as it goes.

hexylena commented 4 years ago

If we could swap the python to use a more normal way of doing things, accumulating + sending, that's fine for me! I believe the 'write to adjust the names' was a bug/oddity workaround ;)

The only concern I'd have is that either it's 100% success of 0%. As long as you run the creation in a database transaction, and rollback in case of error creating one of the features, we'd be happy to use a bulk API on the python side.

But of course clean slate sounds fine too, let's just identify the best implementation and implement on both sides?

(not that I have time to work on this while in sabbatical)

nathandunn commented 4 years ago

@hexylena I'm happy to keep working on it while you're on sabbatical:

The only concern I'd have is that either it's 100% success of 0%. As long as you run the creation in a database transaction, and rollback in case of error creating one of the features, we'd be happy to use a bulk API on the python side.

I agree that makes sense.

My only thought is that doing this using the current API will work with reasonably small numbers <10K. More than that, regardless of the backend, I should open up an API that writes directly to SQL as doing this via hibernate is going to be painful.

So we would need to:

nathandunn commented 4 years ago

Let me know what you think.

nathandunn commented 4 years ago

FYI https://github.com/abretaud/migrate_apollo_db/

nathandunn commented 4 years ago

https://github.com/GMOD/Apollo/issues/2408