Closed marade closed 3 years ago
Sorry, I've never used PROKKA before. How are the files different? Might this be related to https://github.com/gamcil/clinker/issues/4?
Quite possibly it is related to #4. PROKKA is probably the most widely used quick-annotation program right now, so I reckon many people will want this. It appears PROKKA uses BioPerl to generate the files, e.g.
https://metacpan.org/pod/Bio::DB::GenBank
It's really quite easy to run PROKKA and generate them yourself. For your convenience I've attached a GenBank file generated by PROKKA, which I ran on Pseudomonas Aeruginosa PAO1, though this is not ideal since it's only one contig and it's named '1'. PAO1.zip
Note as well the comments about the GenBank format on the PROKKA home page. Thanks much!
Could you give some more info about the error you were running into? The file you uploaded seems to load in fine on my end
The problem appeared to arise from the contig names generated by a SPades genome assembly and then annotated by PROKKA, where clinker would choke on the first (LOCUS) line of the GenBank file, e.g.
LOCUS NODE_1_length_395402_cov_27.667845395402 bp DNA linear
Okay this is definitely BioPython's GenBank parser not being able to parse long locus names, as you said. Unfortunately, there doesn't seem to be a way to get around it since they explicitly count columns when parsing the LOCUS line (i.e. maximum 16 characters for that field unless stealing from the length field, discussed here: https://github.com/biopython/biopython/issues/747).
Unless I can get around to completely switching from the BioPython parser to something else, I don't think there's much I can do about this I'm afraid. In the meantime, could you try the --centre
flag in PROKKA to rename your contigs to be NCBI compliant (as mentioned in the PROKKA readme), then run clinker again?
I'll try this when I get a chance, though if #10 gets solved this will no longer matter to me, since I try to avoid GenBank format whenever possible.
The good news is using the --compliant switch for PROKKA apparently allows the script to continue beyond where it would previously crash, but see #21 mentioned above.
Will close this one too since the PROKKA flag works and GFF support has been added with v0.0.10.
Is there a way to make this work? Or do you plan to add support for these? Thanks!