chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
610 stars 243 forks source link

BUG report #83

Closed Hugh-Zhu closed 10 years ago

Hugh-Zhu commented 10 years ago

A 9th column in one line of my .gff file(gff3) is like this. ID=PH01000020G1780;Description="osFTL6 FT-Like6 homologous to Flowering Locus T gene; contains Pfam profile PF01161: Phosphatidylethanolamine-binding protein, expressed" where there are two ; and one of which is with a space after. then the list parts will be ['ID=PH01000020G1780;Description="osFTL6 FT-Like6 homologous to Flowering Locus T gene','contains Pfam profile PF01161: Phosphatidylethanolamine-binding protein, expressed"'] whose latter member has no = in it, which occurred the AssertionError in the 'assert len(item) == 1, item' line below. What I changed here may not be good. I hope there is a more proper way to remove this BUG.

chapmanb commented 10 years ago

Hugh; Thanks for the report. This is a tricky case to deal with. The GFF is technically invalid since the extra semi-colon in the description should be escaped:

http://www.sequenceontology.org/gff3.shtml

so that is putting the splitting off. Where is the original source of the GFF file? It might be worth reporting it to them since it is likely to be incorrectly parsed by other tools

Thanks for the patch. I also pushed a fix to correctly merge these type of files so the description correctly stays together. Thanks again and please let us know if you have any other problems.

Hugh-Zhu commented 10 years ago

I've seen the rules of semicolon. I got this .gff file from my teacher, which I don't the original source. I was only told to do some parsing on it. Maybe it's not a standard gff3 file created by other tools.

2014-03-12 20:58 GMT+08:00 Brad Chapman notifications@github.com:

Hugh; Thanks for the report. This is a tricky case to deal with. The GFF is technically invalid since the extra semi-colon in the description should be escaped:

http://www.sequenceontology.org/gff3.shtml

so that is putting the splitting off. Where is the original source of the GFF file? It might be worth reporting it to them since it is likely to be incorrectly parsed by other tools

Thanks for the patch. I also pushed a fix to correctly merge these type of files so the description correctly stays together. Thanks again and please let us know if you have any other problems.

Reply to this email directly or view it on GitHubhttps://github.com/chapmanb/bcbb/pull/83#issuecomment-37404918 .