filip-husnik / pseudofinder

Detection of pseudogene candidates in bacterial and archaeal genomes.
GNU General Public License v3.0
44 stars 16 forks source link

Annotate module #51

Closed navkahlon240 closed 1 year ago

navkahlon240 commented 1 year ago

I am trying to get number of pseudogenes from one of my files. As, it is mentioned that pseudofinder accepts .gbf as well as .gbk as its input files. I am trying gbf files as an input. But it is giving me some errors. I am pasting the errors below.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/scratch/aubnxk001/pseudofinder/pseudofinder.py", line 42, in annotate.main() File "/scratch/aubnxk001/pseudofinder/modules/annotate.py", line 1111, in main args = common.get_args('annotate') File "/scratch/aubnxk001/pseudofinder/modules/common.py", line 648, in get_args verify_args(args, deprecated_args) File "/scratch/aubnxk001/pseudofinder/modules/common.py", line 168, in verify_args verify_gbk(args.genome) File "/scratch/aubnxk001/pseudofinder/modules/common.py", line 131, in verify_gbk for contig in SeqIO.parse(gbk, "genbank"): File "/home/aubnxk001/miniconda3/envs/pseudofinder/lib/python3.10/site-packages/Bio/SeqIO/Interfaces.py", line 74, in next return next(self.records) File "/home/aubnxk001/miniconda3/envs/pseudofinder/lib/python3.10/site-packages/Bio/GenBank/Scanner.py", line 516, in parse_records record = self.parse(handle, do_features) File "/home/aubnxk001/miniconda3/envs/pseudofinder/lib/python3.10/site-packages/Bio/GenBank/Scanner.py", line 499, in parse if self.feed(handle, consumer, do_features): File "/home/aubnxk001/miniconda3/envs/pseudofinder/lib/python3.10/site-packages/Bio/GenBank/Scanner.py", line 465, in feed self._feed_first_line(consumer, self.line) File "/home/aubnxk001/miniconda3/envs/pseudofinder/lib/python3.10/site-packages/Bio/GenBank/Scanner.py", line 1571, in _feed_first_line raise ValueError("Did not recognise the LOCUS line layout:\n" + line) ValueError: Did not recognise the LOCUS line layout: LOCUS NODE_24_length_35456_cov_62.21425135456 bp DNA linear.

mitchso commented 1 year ago

Hi,

Looking at your error it seems like the file you provided has an improperly formatted locus line. Check out this link: https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

As an example, the locus line could look like this: LOCUS SCU49845 5028 bp DNA PLN 21-JUN-1999

Just guessing from what yours looks like, could there be a missing space in "NODE_24_length_35456_cov_62.21425135456 bp DNA linear", such that it actually should read like "NODE_24_length_35456_cov_62.214251 [SPACE] 35456 bp DNA linear"?

navkahlon240 commented 1 year ago

Thank you for this information. I don't have much coding skills. If space is the problem,do you know any script by which I can change my file format; specifically adding space to the locus length?

mitchso commented 1 year ago

Hi, I can't be sure that space is the problem without playing around with the file, but that is what I would check first if I were you. If your file was produced by a good quality annotation program, this is more likely that a space was accidentally deleted at one position by whoever has been handling the file, and less likely a pervasive problem that you would need a script to fix.

I will leave it up to you to troubleshoot - it's unfortunate but in bioinformatics you will run into many small issues like this and the only option is to work through them.

Best, Mitch