filip-husnik / pseudofinder

Detection of pseudogene candidates in bacterial and archaeal genomes.
GNU General Public License v3.0
42 stars 16 forks source link

dnds analysis with ref .gbff #21

Closed eddykay310 closed 2 years ago

eddykay310 commented 3 years ago

command

/pseudofinder/pseudofinder.py annotate -g 180.gbk -db swissprot -op pseudAnnot_swiss --diamond -ref genomic.gbff

output error

Sleuth branch: Running BLAST Done with BLAST . starting Muscle running Muscle: 100% done with Muscle . preparing for codeml Traceback (most recent call last): File "/root/pseudofinder/pseudofinder.py", line 27, in annotate.main() File "/root/pseudofinder/modules/annotate.py", line 1029, in main sleuth.full(args, file_dict) File "/root/pseudofinder/modules/sleuth.py", line 1683, in full if lengthDiff > float(args.perc_cov): AttributeError: 'Namespace' object has no attribute 'perc_cov'

Please any help with the error above?

eddykay310 commented 3 years ago

command

~/pseudofinder/pseudofinder.py selection -a 180.faa -n 180.fna -ra protein.faa -rn genomic.fna -ref genomic.gbff

output error

Starting pipeline... dnds_out dnds_out/dnds-analysis Running BLAST Traceback (most recent call last): File "/root/pseudofinder/pseudofinder.py", line 33, in selection.main() File "/root/pseudofinder/modules/selection.py", line 459, in main outNUC.write(refFnaDict[ls[1]] + "\n") TypeError: unsupported operand type(s) for +: 'collections.defaultdict' and 'str'

Tried to run the dnds analysis only and still had an error

Arkadiy-Garber commented 3 years ago

Hi there Edwin,

Thanks for your interest in PseudoFinder, and for bringing these issues to attention.

Apologies for the confusion regarding this, but we are in the process of retiring the Selection module, which is being replaced with the Sleuth module. The Sleuth module also includes dN/dS analysis and can be run independently or concurrently with the Annotate module when you provide a reference genome. We are in the process of updating the README with the latest updates.

With regard to the Attribute error you pasted above, there was a bug in the software from the latest update. I just fixed this, and updated the main repository. Can you please re-download Pseudofinder from GitHub and try the Annotate run again. Let me know if you have further issues or have any questions.

Thanks! Arkadiy

eddykay310 commented 3 years ago

Hi Arkdadiy,

Thanks for the quick response. Appreciate it!

An error from the update you just made. Broke the finder which was working before.

Traceback (most recent call last): File "/root/pseudofinder/pseudofinder.py", line 27, in annotate.main() File "/root/pseudofinder/modules/annotate.py", line 958, in main args = common.get_args('annotate') File "/root/pseudofinder/modules/common.py", line 573, in get_args unpack_arg(optional, arg) File "/root/pseudofinder/modules/common.py", line 104, in unpack_arg name = (arg['short'], arg['long']) KeyError: 'short'

In addition can you suggest ways or other bioinformatics tools I can use to tell the type of mutation (frameshift, nonsense,...) in the pseudogene.

Thank you, Edwin

Arkadiy-Garber commented 3 years ago

Hi Edwin,

Sorry you are still having issues. I just did one more fix, and tested the software out. It seems to work fine now on my end. Please try again with a fresh git clone https://github.com/filip-husnik/pseudofinder.git.

I am not aware of any other bioinformatics tools for this. DFAST and PGAP are annotation pipelines that report pseudogenes, but I don't think those tools report any more information than just "fragmentation", "internal stop codon", or "missing C/N-terminal". Psi-Phi is a tool dedicated to pseudogene identification, but I don't think it is open-source, so you may have to get in touch with the authors of the original publication: https://pubmed.ncbi.nlm.nih.gov/15479949/.

Thanks and good luck! Arkadiy

eddykay310 commented 3 years ago

Ok thank you very much.

Does pseudofinder give information on fragmentation, internal stop codon, missing C/N-terminal as you mentioned above apart from the predicted fragmentation and percentage of genes?

Thanks for this tool!!

Arkadiy-Garber commented 3 years ago

Pseudofinder will tell you which genes are fragmented or shorter/longer than expected in the pseudos.gff file that is produced with the output. If you provide a closely related reference genome via the -ref flag, it should also generate a file called "sleuth_report.csv", which will provide a detailed overview of all gene disruptions based on a pairwise comparison between the two genomes - this will include information on genes missing start/stop codons, internal stop codons and nonsense mutations, frameshift-inducing indels, relaxed selection and (elevated dN/dS).

Hope this helps! Let me know if you have any other questions. Happy to help! Arkadiy

eddykay310 commented 3 years ago

Great!! Thanks. It really helps.

It worked but there was an issue during the latter stages of the analysis.

Traceback (most recent call last): File "/root/pseudofinder/pseudofinder.py", line 27, in annotate.main() File "/root/pseudofinder/modules/annotate.py", line 1030, in main sleuth.merge(args, file_dict) File "/root/pseudofinder/modules/sleuth.py", line 2113, in merge bias = sorted([ds, dsNoMercy])[1] / sorted([ds, dsNoMercy])[0] ZeroDivisionError: float division by zero

Edwin

Arkadiy-Garber commented 3 years ago

Hi Edwin,

Thanks, glad that helps. Don't hesitate to reach out with any other questions.

Regarding the error above, I just updated with another fix. The fix addresses the error you are seeing. Let me know if you continue to have issues.

Thanks, Arkadiy

eddykay310 commented 3 years ago

Hi Arkadiy,

Pal2nal now gives an error which occurs from 0% to 100%

running pal2nal: 0% ERROR: number of input seqs differ (aa: 2; nuc: 0)!!

aa 'MUL_2251 MU_97-5290-6255' nuc '' running pal2nal: 0%

The "ZeroDivisionError: float division by zero" still persists.

Traceback (most recent call last): File "/root/pseudofinder/pseudofinder.py", line 27, in annotate.main() File "/root/pseudofinder/modules/annotate.py", line 1030, in main sleuth.merge(args, file_dict) File "/root/pseudofinder/modules/sleuth.py", line 2116, in merge bias = sorted([ds, dsNoMercy])[1] / sorted([ds, dsNoMercy])[0] ZeroDivisionError: float division by zero

Also check the rest of the commands like reannotate because I tried using reannotate and it threw errors as annotate did (KeyError: 'short').

Thanks!

Arkadiy-Garber commented 3 years ago

Hi Edwin,

Thanks for letting me know about these issues, and sorry that you continue to get errors. When running the software on my own .gbff files, the program does not appear to have any issues. Can I ask how the input files you are providing to Pseudofinder were generated? We tested this software mostly with Prokka-generated .gbff files. But the software also successfully works with at least some of the DFAST- and PGAP-generated .gbff/.gbk files.

Going forward, to expedite troubleshooting, it would be more efficient for me to have the .gbff files you are providing to the software. Let me know if that would be possible. No worries either way though. I know that sharing unpublished data online is not always a good idea.

Thanks! Arkadiy

eddykay310 commented 3 years ago

Hi Arkadiy,

Thank you for your response.

I am using a .gbk file from annotation with Prokka (.gbff file was absent in the output files). I use the swissprot db and a .gbff ref to run the pseudofinder annotate program. The only error ,previously, was the ds bit with the float division by zero error but since the last update, a few more have come up.

Let me see if I can send you the files I am using.

Thanks! Edwin

Arkadiy-Garber commented 3 years ago

Sounds good, thanks! And how was the .gbff file generated?

You can also email them to me if that is easier: agarber4@asu.edu.

daanaejasso commented 3 years ago

Hello! I ran the 'annotate' module with a reference genome, but the sleuth directory is empty. Then, I tried doing the sleuth analysis again, but it wont accept the .gbk file. Is there anything else i can do to run the module properly?

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

python3 pseudofinder.py sleuth -rn Halorubrum_ref_cds.fasta -g prokka06062021.gbk -rg GCF_002252755.1_ASM225275v1_genomic.gff -out Pseudofinder_DNDS_Halo/ Running BLAST Done with BLAST . starting Muscle

done with codeml . 0 out of 1 input reference CDS had detectable homology (below evalue of 1e-4) writing output file gzip: Pseudofinder_DNDS_Halo//nuc_aln.tar.gz already exists; do you wish to overwrite (y or n)? y

Arkadiy-Garber commented 3 years ago

Hi,

Thanks for your interest in pseudofinder! If you are using the sleuth module independently, you will need to provide genome contigs (.fna), instead of a gbk file. Sorry for the confusion. We are currently working to better integrate the sleuth module (a relatively new addition to the pseudofinder package) into the overall pseudofinder framework.

Let me know if this resolves the issue or if you continue getting an error.

Thanks, Arkadiy

daanaejasso commented 3 years ago

The alignment did start, but I'm getting this error for every CDS:

ERROR Cannot open 'Halorubrum_sleuth//nuc_aln/lcl' errno=2

sh: 1: NZ_NHOY01000096.1_cds_WP_094590002.1_1598__AHHLFOFG_01031-1-2197.ffn: not found