Error when running puma via docker

yjx1217 commented 1 year ago

Hello, we were trying to run puma via docker but encountered the following error. Could you please help us to diagnose what might be the cause. Thanks in advance!

Command:

docker run --rm -v "$(pwd)/data/:/data" -v "$(pwd)/in_out/:/in_out"  kvdlab/puma:1.2.1 run_puma.py \
    -i /in_out/hom_sap_BF288.final.tidy.fa \
    -o /in_out/puma_out \
    -d /data

Error message:


Traceback (most recent call last):
  File "/app/puma/scripts/run_puma.py", line 138, in <module>
    main()
  File "/app/puma/scripts/run_puma.py", line 130, in main
    puma.run(args)
  File "/app/puma/scripts/puma.py", line 2178, in run
    altered_genome = linearize_genome(original_genome, args)
  File "/app/puma/scripts/puma.py", line 107, in linearize_genome
    proteins = identify_main_proteins(extended_genome, args)
  File "/app/puma/scripts/puma.py", line 232, in identify_main_proteins
    blast_out, orfs_fa = blast_main_orfs(genome, args)
  File "/app/puma/scripts/puma.py", line 214, in blast_main_orfs
    num_hits = run_blastp(orfs_fa, blast_sub, blast_out)
  File "/app/puma/scripts/puma.py", line 175, in run_blastp
    stdout, stderr = cmd()
  File "/usr/local/lib/python3.8/site-packages/Bio/Application/__init__.py", line 569, in __call__
    raise ApplicationError(return_code, str(self), stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 1 from 'blastp -out /in_out/puma_out/hom_sap_BF288/program_files/main_blast/blast_results_main.tab -outfmt 6 -query /in_out/puma_out/hom_sap_BF288/program_files/main_blast/orfs.fa -evalue 1e-05 -subject /data/main_blast.fa', message 'Command line argument error: Argument "subject". File is not accessible:  `/data/main_blast.fa\''

KVDlab commented 1 year ago

sorry about the delayed response, would you mind sharing the fasta file?

RobJackson28 commented 1 year ago

@yjx1217 I was able to reproduce the error via docker and was able to resolve it as follows:

Remove -v "$(pwd)/data/:/data". This mounts a local data folder which would need to contain the data_dir files.
Change -d /data to -d /app/puma/data_dir. Rather than requiring a mounted local folder, this points directly to the required data_dir files that are already within the docker container.
Make sure you have a local folder that contains your input fasta file. In the working code below, I named this "input_and_output", just to keep it unique. This local folder gets mounted as "in_out". The output folder will get written to this same folder.

This docker code runs successfully: sudo docker run --rm -v "$(pwd)/input_and_output/:/in_out" kvdlab/puma:1.2.1 run_puma.py -i /in_out/hom_sap_BF288.final.tidy.fa -o /in_out/puma_out -d /app/puma/data_dir

Hopefully that works for you!

yjx1217 commented 1 year ago

Dear @RobJackson28,

Thanks for the feedback and detailed information. I made the change according to your suggestions and the previous error disappeared. So many thanks for the tips!

However, I now encountered a new error. Could you provide further help please?

The error message is as follows:

Traceback (most recent call last):
  File "/app/puma/scripts/run_puma.py", line 138, in <module>
    main()
  File "/app/puma/scripts/run_puma.py", line 130, in main
    puma.run(args)
  File "/app/puma/scripts/puma.py", line 2278, in run
    virus.update(find_urr(virus))
  File "/app/puma/scripts/puma.py", line 666, in find_urr
    int(genomelen), 1,
NameError: name 'genomelen' is not defined

The input fasta file is further attached.

Best, Jia-Xing

hom_sap_BF288.final.tidy.fa.gz

KVDlab commented 1 year ago

@yjx1217, could you share the fasta sequence?

yjx1217 commented 1 year ago

Hi @KVDlab ，

The input fasta file has been attached in my last reply on the github issue page (https://github.com/KVD-lab/puma/issues/2). The direct URL is as follows: https://github.com/KVD-lab/puma/files/10516081/hom_sap_BF288.final.tidy.fa.gz :-)

Best, Jia-Xing

KVDlab commented 1 year ago

@yjx1217

This new 'puma.py' should run your sequence. That was a weird bug, thanks for pointing it out. We will update the docker and github repo asap

puma.py.zip

yjx1217 commented 1 year ago

Dear @KVDlab ,

Many thanks for the quick fix! I installed puma via docker. So is it possible to directly replace the old puma.py script with the new one within docker? If so, could you guide me how to do it? Thanks in advance!

Best, Jia-Xing

KVDlab commented 1 year ago

@yjx1217 docker should be updated and good to go

yjx1217 commented 1 year ago

Dear @KVDlab ,

Many thanks for the fix and version update! The docker version 1.2.2 works nicely now!

Best, Jia-Xing

yjx1217 commented 1 year ago

Dear @KVDlab

Sorry. While testing with HPV16 and HPV18 reference genomes, we noticed that although the annotated CDS/protein sequences seems to be correct, the genomic coordinates might be wrong.

For example, in the reference annotation of HPV16 (https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4), the annotation shows>>


[gene](https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4?from=1892&to=2989)            1892..2989
                     /gene="E2"
                     /locus_tag="HpV16gp4"
                     /db_xref="GeneID:[1489080](https://www.ncbi.nlm.nih.gov/gene/1489080)"
     [CDS](https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4?from=1892&to=2989)             1892..2989
                     /gene="E2"
                     /locus_tag="HpV16gp4"
                     /note="E2. Plays a role in the initiation of viral DNA
                     replication. Forms E1-E2 dimer with replication protein
                     E1. The E1-E2 complex binds to the replication origin
                     which contains binding sites for both proteins."
                     /codon_start=1
                     /product="regulatory protein E2"
                     /protein_id="[NP_041328.1](https://www.ncbi.nlm.nih.gov/protein/9627106)"
                     /db_xref="GeneID:[1489080](https://www.ncbi.nlm.nih.gov/gene/1489080)"
                     /translation="METLCQRLNVCQDKILTHYENDSTDLRDHIDYWKHMRLECAIYY
                     KAREMGFKHINHQVVPTLAVSKNKALQAIELQLTLETIYNSQYSNEKWTLQDVSLEVY
                     LTAPTGCIKKHGYTVEVQFDGDICNTMHYTNWTHIYICEEASVTVVEGQVDYYGLYYV
                     HEGIRTYFVQFKDDAEKYSKNKVWEVHAGGQVILCPTSVFSSNEVSSPEIIRQHLANH
                     PAATHTKAVALGTEETQTTIQRPRSEPDTGNPCHTTKLLHRDSVDSAPILTAFNSSHK
                     GRINCNSNTTPIVHLKGDANTLKCLRYRFKKHCTLYTAVSSTWHWTGHNVKHKSAIVT
                     LTYDSEWQRDQFLSQVKIPKTITVSTGFMSI"

However, the annotation from PUMA suggests:


E2 start and stop position:
3506,4603

E2 sequence:
atggagactctttgccaacgtttaaatgtgtgtcaggacaaaatactaacacattatgaaaatgatagtacagacctacgtgaccatatagactattggaaacacatgcgcctagaatgtgctatttattacaaggccagagaaatgggatttaaacatattaaccaccaggtggtgccaacactggctgtatcaaagaataaagcattacaagcaattgaactgcaactaacgttagaaacaatatataactcacaatatagtaatgaaaagtggacattacaagacgttagccttgaagtgtatttaactgcaccaacaggatgtataaaaaaacatggatatacagtggaagtgcagtttgatggagacatatgcaatacaatgcattatacaaactggacacatatatatatttgtgaagaagcatcagtaactgtggtagagggtcaagttgactattatggtttatattatgttcatgaaggaatacgaacatattttgtgcagtttaaagatgatgcagaaaaatatagtaaaaataaagtatgggaagttcatgcgggtggtcaggtaatattatgtcctacatctgtgtttagcagcaacgaagtatcctctcctgaaattattaggcagcacttggccaaccaccccgccgcgacccataccaaagccgtcgccttgggcaccgaagaaacacagacgactatccagcgaccaagatcagagccagacaccggaaacccctgccacaccactaagttgttgcacagagactcagtggacagtgctccaatcctcactgcatttaacagctcacacaaaggacggattaactgtaatagtaacactacacccatagtacatttaaaaggtgatgctaatactttaaaatgtttaagatatagatttaaaaagcattgtacattgtatactgcagtgtcgtctacatggcattggacaggacataatgtaaaacataaaagtgcaattgttacacttacatatgatagtgaatggcaacgtgaccaatttttgtctcaagttaaaataccaaaaactattacagtgtctactggatttatgtctatatga

E2 translated sequence:
METLCQRLNVCQDKILTHYENDSTDLRDHIDYWKHMRLECAIYYKAREMGFKHINHQVVPTLAVSKNKALQAIELQLTLETIYNSQYSNEKWTLQDVSLEVYLTAPTGCIKKHGYTVEVQFDGDICNTMHYTNWTHIYICEEASVTVVEGQVDYYGLYYVHEGIRTYFVQFKDDAEKYSKNKVWEVHAGGQVILCPTSVFSSNEVSSPEIIRQHLANHPAATHTKAVALGTEETQTTIQRPRSEPDTGNPCHTTKLLHRDSVDSAPILTAFNSSHKGRINCNSNTTPIVHLKGDANTLKCLRYRFKKHCTLYTAVSSTWHWTGHNVKHKSAIVTLTYDSEWQRDQFLSQVKIPKTITVSTGFMSI

Best, Jia-Xing

KVDlab commented 1 year ago

Hi Jia-Xing, When we made PuMA, we decided the recircularize all genomes at the same position (i.e., the first nt after the L1 stop codon). Since most of the older genomes in genbank recircularize somewhere upstream of E6, our positions will be offset. I hope this help! Koenraad

On Feb 16, 2023, at 11:59 PM, Jia-Xing Yue @.**@.>> wrote:

External Email

Dear @KVDlabhttps://github.com/KVDlab

Sorry. While testing with HPV16 and HPV18 reference genomes, we noticed that although the annotated CDS/protein sequences seems to be correct, the genomic coordinates might be wrong.

For example, in the reference annotation of HPV16 (https://www.ncbi.nlm.nih.gov/nuccore/NC_001526.4), the annotation shows>>

gene 1892..2989 /gene="E2" /locus_tag="HpV16gp4" /db_xref="GeneID:1489080" CDS 1892..2989 /gene="E2" /locus_tag="HpV16gp4" /note="E2. Plays a role in the initiation of viral DNA replication. Forms E1-E2 dimer with replication protein E1. The E1-E2 complex binds to the replication origin which contains binding sites for both proteins." /codon_start=1 /product="regulatory protein E2" /protein_id="NP_041328.1" /db_xref="GeneID:1489080" /translation="METLCQRLNVCQDKILTHYENDSTDLRDHIDYWKHMRLECAIYY KAREMGFKHINHQVVPTLAVSKNKALQAIELQLTLETIYNSQYSNEKWTLQDVSLEVY LTAPTGCIKKHGYTVEVQFDGDICNTMHYTNWTHIYICEEASVTVVEGQVDYYGLYYV HEGIRTYFVQFKDDAEKYSKNKVWEVHAGGQVILCPTSVFSSNEVSSPEIIRQHLANH PAATHTKAVALGTEETQTTIQRPRSEPDTGNPCHTTKLLHRDSVDSAPILTAFNSSHK GRINCNSNTTPIVHLKGDANTLKCLRYRFKKHCTLYTAVSSTWHWTGHNVKHKSAIVT LTYDSEWQRDQFLSQVKIPKTITVSTGFMSI"

However, the annotation from PUMA suggests:

E2 start and stop position: 3506,4603

E2 sequence: atggagactctttgccaacgtttaaatgtgtgtcaggacaaaatactaacacattatgaaaatgatagtacagacctacgtgaccatatagactattggaaacacatgcgcctagaatgtgctatttattacaaggccagagaaatgggatttaaacatattaaccaccaggtggtgccaacactggctgtatcaaagaataaagcattacaagcaattgaactgcaactaacgttagaaacaatatataactcacaatatagtaatgaaaagtggacattacaagacgttagccttgaagtgtatttaactgcaccaacaggatgtataaaaaaacatggatatacagtggaagtgcagtttgatggagacatatgcaatacaatgcattatacaaactggacacatatatatatttgtgaagaagcatcagtaactgtggtagagggtcaagttgactattatggtttatattatgttcatgaaggaatacgaacatattttgtgcagtttaaagatgatgcagaaaaatatagtaaaaataaagtatgggaagttcatgcgggtggtcaggtaatattatgtcctacatctgtgtttagcagcaacgaagtatcctctcctgaaattattaggcagcacttggccaaccaccccgccgcgacccataccaaagccgtcgccttgggcaccgaagaaacacagacgactatccagcgaccaagatcagagccagacaccggaaacccctgccacaccactaagttgttgcacagagactcagtggacagtgctccaatcctcactgcatttaacagctcacacaaaggacggattaactgtaatagtaacactacacccatagtacatttaaaaggtgatgctaatactttaaaatgtttaagatatagatttaaaaagcattgtacattgtatactgcagtgtcgtctacatggcattggacaggacataatgtaaaacataaaagtgcaattgttacacttacatatgatagtgaatggcaacgtgaccaatttttgtctcaagttaaaataccaaaaactattacagtgtctactggatttatgtctatatga

E2 translated sequence: METLCQRLNVCQDKILTHYENDSTDLRDHIDYWKHMRLECAIYYKAREMGFKHINHQVVPTLAVSKNKALQAIELQLTLETIYNSQYSNEKWTLQDVSLEVYLTAPTGCIKKHGYTVEVQFDGDICNTMHYTNWTHIYICEEASVTVVEGQVDYYGLYYVHEGIRTYFVQFKDDAEKYSKNKVWEVHAGGQVILCPTSVFSSNEVSSPEIIRQHLANHPAATHTKAVALGTEETQTTIQRPRSEPDTGNPCHTTKLLHRDSVDSAPILTAFNSSHKGRINCNSNTTPIVHLKGDANTLKCLRYRFKKHCTLYTAVSSTWHWTGHNVKHKSAIVTLTYDSEWQRDQFLSQVKIPKTITVSTGFMSI

Best, Jia-Xing

— Reply to this email directly, view it on GitHubhttps://github.com/KVD-lab/puma/issues/2#issuecomment-1434180140, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKMLAIFC7J3NKZQ5PE2NTATWX4OW7ANCNFSM6AAAAAATZ56Z54. You are receiving this because you were mentioned.Message ID: @.***>

yjx1217 commented 1 year ago

Dear @KVDlab,

I see. Make sense. Thank you very much!

On another note, after making more tests with puma, we noticed that: 1) The HPV18 L1 gene annotated from puma is always a bit shorter compared with the HPV18 refseq (NC_001357.1) on NCBI. 2) It seems that the E4 gene cannot be annotated by PUMA.

We were wondering if further updates could be applied to make corresponding improvements.

FYI: This is the HPV18 L1 protein sequence based on NCBI REF (NC_001357.1): MCLYTRVLILHYHLLPLYGPLYHPQPLPLHSILVYMVHIIICGHYIILFLKSVNVFPIFLQMALWRPSDNTVYLPPPSVARVVNTDDYVTRTSIFYHAGSSRLLTVGNPYFRVPAGGGNKQDIPKVSAYQYRVFRVQLPDPNKFGLPDNSIYNPETQRLVWACAGVEIGRGQPLGVGLSGHPFYNKLDDTESSHAATSNVSEDVRDNVSVDYKQTQLCILGCAPAIGEHWAKGTACKSRPLSQGDCPPLELKNTVLEDGDMVDTGYGAMDFSTLQDTKCEVPLDICQSICKYPDYLQMSADPYGDSMFFCLRREQLFARHFWNRAGTMGDTVPQSLYIKGTGMRASPGSCVYSPSPSGSIVTSDSQLFNKPYWLHKAQGHNNGICWHNQLFVTVVDTTRSTNLTICASTQSPVPGQYDATKFKQYSRHVEEYDLQFIFQLCTITLTADVMSYIHSMNSSILEDWNFGVPPPPTTSLVDTYRFVQSVAITCQKDAAPAENKDPYDKLKFWNVDLKEKFSLDLDQYPLGRKFLVQAGLRRKPTIGPRKRSAPSATTSSKPAKRVRVRARK

And this is the HPV18 L1 protein sequence annotated by PUMA using the same input genome sequnce: MALWRPSDNTVYLPPPSVARVVNTDDYVTPTSIFYHAGSSRLLTVGNPYFRVPAGGGNKQDIPKVSAYQYRVFRVQLPDPNKFGLPDTSIYNPETQRLVWACAGVEIGRGQPLGVGLSGHPFYNKLDDTESSHAATSNVSEDVRDNVSVDYKQTQLCILGCAPAIGEHWAKGTACKSRPLSQGDCPPLELKNTVLEDGDMVDTGYGAMDFSTLQDTKCEVPLDICQSICKYPDYLQMSADPYGDSMFFCLRREQLFARHFWNRAGTMGDTVPQSLYIKGTGMPASPGSCVYSPSPSGSIVTSDSQLFNKPYWLHKAQGHNNGVCWHNQLFVTVVDTTPSTNLTICASTQSPVPGQYDATKFKQYSRHVEEYDLQFIFQLCTITLTADVMSYIHSMNSSILEDWNFGVPPPPTTSLVDTYRFVQSVAITCQKDAAPAENKDPYDKLKFWNVDLKEKFSLDLDQYPLGRKFLVQAGLRRKPTIGPRKRSAPSATTSSKPAKRVRVRARK

Best, Jia-Xing

KVDlab commented 1 year ago

Hey, Not sure why you are reopening, but if it is due to the E4 and L1 I can explain below

1) As we describe in the PuMA paper, the biological evidence suggests that L1 uses a Met start codon that is typically 4 aa upstream of a conserved W. 2) We do not think ‘E4’ is a protein. We annotate E1^E4, not the E4 exon alone.

Hope that helps

On Feb 26, 2023, at 5:30 PM, Jia-Xing Yue @.**@.>> wrote:

External Email

Reopened #2https://github.com/KVD-lab/puma/issues/2.

— Reply to this email directly, view it on GitHubhttps://github.com/KVD-lab/puma/issues/2#event-8610509584, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKMLAICOF5BRT5DMFAS255LWZPYSXANCNFSM6AAAAAATZ56Z54. You are receiving this because you were mentioned.Message ID: @.***>

yjx1217 commented 1 year ago

Dear @KVDlab,

Many thanks for the explanation! Very helpful! I am closing the ticket now.

Best, Jia-Xing

KVD-lab / puma

Error when running puma via docker #2