BrendelGroup / AEGeAn

Integrated toolkit for analysis and evaluation of annotated genomes
http://brendelgroup.github.io/AEGeAn
ISC License
24 stars 10 forks source link

xtractore --idfile option not working, all sequences are being extracted #183

Closed tedtoal closed 7 years ago

tedtoal commented 7 years ago

xtractore --idfile option not working, all gene sequences are being extracted

version 0.16.0

My command line:

xtractore -type gene --idfile <(echo "gene:TCM_016803") -o out.fasta my.gff3 my.fa

Also tried without "-type gene"

Also tried by making an actual text file containing "gene:TCM_016803"

Here is that gene in the GFF3:

Matina16_OnlyHap_PacBio ena gene 139780 144434 . - . ID=gene:TCM_016803;biotype=protein_coding;description=Gamete expressed protein 1%2C putative;gene_id=TCM_016803;logic_name=ena Matina16_OnlyHap_PacBio ena mRNA 139780 144434 . - . ID=transcript:EOY02276;Parent=gene:TCM_016803;biotype=protein_coding;transcript_id=EOY02276 Matina16_OnlyHap_PacBio ena CDS 140302 140554 . - 1 ID=CDS:EOY02276;Parent=transcript:EOY02276;protein_id=EOY02276 Matina16_OnlyHap_PacBio ena exon 140639 140786 . - . Parent=transcript:EOY02276;Name=EOY02276-5;constitutive=1;ensembl_end_phase=2;ensembl_phase=1;exon_id=EOY02276-5;rank=5 Matina16_OnlyHap_PacBio ena CDS 140639 140786 . - 2 ID=CDS:EOY02276;Parent=transcript:EOY02276;protein_id=EOY02276 Matina16_OnlyHap_PacBio ena exon 140907 141093 . - . Parent=transcript:EOY02276;Name=EOY02276-4;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=EOY02276-4;rank=4 Matina16_OnlyHap_PacBio ena CDS 140907 141093 . - 0 ID=CDS:EOY02276;Parent=transcript:EOY02276;protein_id=EOY02276 Matina16_OnlyHap_PacBio ena exon 141168 141294 . - . Parent=transcript:EOY02276;Name=EOY02276-3;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=EOY02276-3;rank=3 Matina16_OnlyHap_PacBio ena CDS 141168 141294 . - 1 ID=CDS:EOY02276;Parent=transcript:EOY02276;protein_id=EOY02276 Matina16_OnlyHap_PacBio ena exon 141381 141968 . - . Parent=transcript:EOY02276;Name=EOY02276-2;constitutive=1;ensembl_end_phase=2;ensembl_phase=2;exon_id=EOY02276-2;rank=2 Matina16_OnlyHap_PacBio ena CDS 141381 141968 . - 1 ID=CDS:EOY02276;Parent=transcript:EOY02276;protein_id=EOY02276 Matina16_OnlyHap_PacBio ena CDS 142076 142551 . - 0 ID=CDS:EOY02276;Parent=transcript:EOY02276;protein_id=EOY02276 Matina16_OnlyHap_PacBio ena exon 142076 144434 . - . Parent=transcript:EOY02276;Name=EOY02276-1;constitutive=1;ensembl_end_phase=2;ensembl_phase=0;exon_id=EOY02276-1;rank=1 Matina16_OnlyHap_PacBio ena_mobile_element biological_region 142178 143271 . + . external_name=source:scaffold_4_CACTA_0482 763 1856;logic_name=ena_mobile_element Matina16_OnlyHap_PacBio ena five_prime_UTR 142552 144434 . - . Parent=transcript:EOY02276 Matina16_OnlyHap_PacBio ena five_prime_UTR 143249 143278 . + . Parent=transcript:EOY02277 Matina16_OnlyHap_PacBio ena exon 143249 143486 . + . Parent=transcript:EOY02277;Name=EOY02277-1;constitutive=1;ensembl_end_phase=1;ensembl_phase=0;exon_id=EOY02277-1;rank=1

standage commented 7 years ago

Hi Ted,

After a couple quick tests myself and looking through the code again, it looks like this option was a bit of optimistic planning on my part: a feature I fully intended to add, but that slipped through the cracks.

Sorry the usage statement is misleading, I'll try to implement this feature soon.

tedtoal commented 7 years ago

Ok, thanks. I think it is a feature most people would expect to use with that program. But the workaround isn’t bad, to extract all and then pipe it to samtools faidx to extract the sequence you want.

I was looking at the C code object-oriented methodology yesterday - interesting. Is that your stuff? I was wondering if more than one interface can be added to a single object, given that the interface pointer is at the start of the object struct.

ted

— Ted Toal, Postdoctoral Researcher Carvajal-Carmona Lab 4502 GBSF, One Shields Ave Davis, CA 956626 (530) 263-5986 twtoal@ucdavis.edu

On May 18, 2017, at 9:38 PM, Daniel Standage notifications@github.com wrote:

Hi Ted,

After a couple quick tests myself and looking through the code again, it looks like this option was a bit of optimistic planning on my part: a feature I fully intended to add, but that slipped through the cracks.

Sorry the usage statement is misleading, I'll try to implement this feature soon.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BrendelGroup/AEGeAn/issues/183#issuecomment-302606564, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXJz3RC062YkkBbcYo9HX-BFW_lb7VAks5r7RzdgaJpZM4NgDRj.

standage commented 7 years ago

The C coding style in AEGeAn is inspired by that of GenomeTools (which not-so-coincidentally is a dependency). I don't think it's very easy to implement multiple interfaces with a single object, although @satta and @gordon (core GenomeTools devs) would be better suited to discuss that question than I am. :-)

standage commented 7 years ago

Hi @tedtoal, I hope this isn't too late to be useful to you. Updating from the latest version on GitHub should give you a version of xtractore with the --idfile flag working as advertised.

tedtoal commented 7 years ago

Thanks, I’ll update. I’m sure it will be useful in the future.

ted

— Ted Toal, Postdoctoral Researcher Carvajal-Carmona Lab 4502 GBSF, One Shields Ave Davis, CA 956626 (530) 263-5986 twtoal@ucdavis.edu

On Jun 20, 2017, at 1:47 PM, Daniel Standage notifications@github.com wrote:

Hi @tedtoal https://github.com/tedtoal, I hope this isn't too late to be useful to you. Updating from the latest version on GitHub should give you a version of xtractore with the --idfile flag working as advertised.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BrendelGroup/AEGeAn/issues/183#issuecomment-309886759, or mute the thread https://github.com/notifications/unsubscribe-auth/ABXJzzNhnRJfph7ss_VjaCf2H3Fdbi0Rks5sGC_1gaJpZM4NgDRj.