Open nathandunn opened 4 years ago
Current features:
GetOptions("input|i=s" => \$input_file,
"username|u=s" => \$username,
"password|p=s" => \$password,
"url|U=s" => \$url,
"gene_types_in|g=s" => \$gene_types_in,
"pseudogene_type_in|n=s" => \$pseudogene_types_in,
"transcript_types_in|t=s" => \$transcript_types_in,
"exon_types_in|e=s" => \$exon_types_in,
"organism|o=s" => \$organism,
"cds_types_in|d=s" => \$cds_types_in,
"ontology|O=s" => \$ontology,
"gene_type_out|G=s" => \$gene_type_out,
"pseudogene_type_out|N=s" => \$pseudogene_type_out,
"mrna_type_out|M=s" => \$mrna_type_out,
"transcript_type_out|T=s" => \$transcript_type_out,
"exon_type_out|E=s" => \$exon_type_out,
"cds_type_out|D=s" => \$cds_type_out,
"property_ontolgy|R=s" => \$property_ontology,
"comment_type_out|C=s" => \$comment_type_out,
"property_type_out|S=s" => \$property_type_out,
"track_prefix|P=s" => \$annotation_track_prefix,
"disable_cds_recalculation|X" => \$disable_cds_recalculation,
"success_log|l=s" => \$success_log_file,
"error_log|L=s" => \$error_log_file,
"skip|s=s" => \$skip_file,
"test|x" => \$test,
"help|h" => \$help,
"name_attributes=s" => \$name_attributes,
"use_name_for_feature|a" => \$use_name_for_feature);
Will just need to do a few of the options:
—test (easy enough), and disable_cds_recalculation and —use_name_for_feature
lol
Looking more closely, I think the differences between the python and original perl scripts are pretty different, though they share some commonalities (lookoing here: https://github.com/galaxy-genome-annotation/python-apollo/pull/25)
I think it would make more sense to do this with a clean slate for a newer apollo developed around the OGS calculations.
The fundamental difference is the perl file accumulates the JSON and then sends (or not) the accumulated features all at once, processing features, transcripts, and variants separately. The python script is more focused on specific use-cases and only works at the features level (which is probably sufficient), doing writes to adjust the names, attributes, as it goes.
If we could swap the python to use a more normal way of doing things, accumulating + sending, that's fine for me! I believe the 'write to adjust the names' was a bug/oddity workaround ;)
The only concern I'd have is that either it's 100% success of 0%. As long as you run the creation in a database transaction, and rollback in case of error creating one of the features, we'd be happy to use a bulk API on the python side.
But of course clean slate sounds fine too, let's just identify the best implementation and implement on both sides?
(not that I have time to work on this while in sabbatical)
@hexylena I'm happy to keep working on it while you're on sabbatical:
The only concern I'd have is that either it's 100% success of 0%. As long as you run the creation in a database transaction, and rollback in case of error creating one of the features, we'd be happy to use a bulk API on the python side.
I agree that makes sense.
My only thought is that doing this using the current API will work with reasonably small numbers <10K. More than that, regardless of the backend, I should open up an API that writes directly to SQL as doing this via hibernate is going to be painful.
So we would need to:
Let me know what you think.
Implement like https://github.com/galaxy-genome-annotation/python-apollo/pull/25
test
: 0 annotations fordisable_cds . . etc
: identical to what was there before I guess?use_name
: identical to what was there before I guessexport ARROW_GLOBAL_CONFIG_PATH=/Users/nathandunn/repositories/python-apollo/test-data/local-arrow.yml
and./bootstrap_apollo.sh --nodocker
===
apollo deploy
script