ParkinsonLab / MetaPro

GNU General Public License v3.0
18 stars 3 forks source link

MetaGeneMark license key not found #27

Closed DGleason-680 closed 4 months ago

DGleason-680 commented 6 months ago

I have been running a sample set (i.e., paired-end reads) using MetaPro through HPC and the job failed just after 72 hours with the following message at the end of the sample_err.txt file generated:

GeneMark.hmm 400-day license.
License key "/home/dvan/.gm_key" not found.
This file is neccessary in order to use GeneMark.hmm.
SPADES_ok_MGM_fail

I have a MetaGeneMark license key in my /home/dvan directory which is labeled as: gm_key_64 Is this label okay to use? Do I just need to change my path indicated in my config file? I had left the path as "MetaGeneMark_model = /pipeline_tools/mgm/MetaGeneMark_v1.mod" based on the config example in the tutorial: MetaGeneMark_model: /pipeline_tools/mgm/MetaGeneMark_v1.mod #(This is in the container already. do not alter)

Some guidance on this issue would be greatly appreciated. I'm hoping this is the last issue I run into before actually obtaining some useful results from a successful run.

The following is the database section of my config.ini file:

[Databases]
database_path = /home/dvan/scratch/MetaPro/dbs/
UniVec_Core = %(database_path)s/univec_core/UniVec_Core.fasta
Adapter = %(database_path)s/trimmomatic_adapters/TruSeq3-PE-2.fa
Rfam = %(database_path)s/Rfam/Rfam.cm
DNA_DB = %(database_path)s/chocophlan_1/chocophlan_full.fasta
DNA_DB_Split = %(database_path)s/chocophlan_split/
Prot_DB = %(database_path)s/nr/nr
Prot_DB_reads = %(database_path)s/nr/nr
accession2taxid = %(database_path)s/accession2taxid/accession2taxid
nodes = %(database_path)s/WEVOTE_db/nodes_wevote.dmp
names = %(database_path)s/WEVOTE_db/names_wevote.dmp
Kaiju_db = %(database_path)s/kaiju_mine/kaiju_db_nr.fmi
Centrifuge_db = %(database_path)s/centrifuge_db/nt
SWISS_PROT = %(database_path)s/swiss_prot_db/
SWISS_PROT_map = %(database_path)s/swiss_prot_db/SwissProt_EC_Mapping.tsv
PriamDB = %(database_path)s/PRIAM_db/
DetectDB = %(database_path)s/DETECTv2/
WEVOTEDB = %(database_path)s/WEVOTE_db/
taxid_tree = %(database_path)s/taxid_trees/class_tree.tsv
kraken2_db = %(database_path)s/kraken2_db/
EC_pathway = %(database_path)s/EC_pathway/EC_pathway.txt
path_to_superpath = %(database_path)s/path_to_superpath/pathway_to_superpathway.csv
MetaGeneMark_model = /pipeline_tools/mgm/MetaGeneMark_v1.mod
billytaj commented 6 months ago

I have been running a sample set (i.e., paired-end reads) using MetaPro through HPC and the job failed just after 72 hours with the following message at the end of the sample_err.txt file generated:

GeneMark.hmm 400-day license.
License key "/home/dvan/.gm_key" not found.
This file is neccessary in order to use GeneMark.hmm.
SPADES_ok_MGM_fail

I have a MetaGeneMark license key in my /home/dvan directory which is labeled as: gm_key_64 Is this label okay to use? Do I just need to change my path indicated in my config file? I had left the path as "MetaGeneMark_model = /pipeline_tools/mgm/MetaGeneMark_v1.mod" based on the config example in the tutorial: MetaGeneMark_model: /pipeline_tools/mgm/MetaGeneMark_v1.mod #(This is in the container already. do not alter)

Some guidance on this issue would be greatly appreciated. I'm hoping this is the last issue I run into before actually obtaining some useful results from a successful run.

The following is the database section of my config.ini file:

[Databases]
database_path = /home/dvan/scratch/MetaPro/dbs/
UniVec_Core = %(database_path)s/univec_core/UniVec_Core.fasta
Adapter = %(database_path)s/trimmomatic_adapters/TruSeq3-PE-2.fa
Rfam = %(database_path)s/Rfam/Rfam.cm
DNA_DB = %(database_path)s/chocophlan_1/chocophlan_full.fasta
DNA_DB_Split = %(database_path)s/chocophlan_split/
Prot_DB = %(database_path)s/nr/nr
Prot_DB_reads = %(database_path)s/nr/nr
accession2taxid = %(database_path)s/accession2taxid/accession2taxid
nodes = %(database_path)s/WEVOTE_db/nodes_wevote.dmp
names = %(database_path)s/WEVOTE_db/names_wevote.dmp
Kaiju_db = %(database_path)s/kaiju_mine/kaiju_db_nr.fmi
Centrifuge_db = %(database_path)s/centrifuge_db/nt
SWISS_PROT = %(database_path)s/swiss_prot_db/
SWISS_PROT_map = %(database_path)s/swiss_prot_db/SwissProt_EC_Mapping.tsv
PriamDB = %(database_path)s/PRIAM_db/
DetectDB = %(database_path)s/DETECTv2/
WEVOTEDB = %(database_path)s/WEVOTE_db/
taxid_tree = %(database_path)s/taxid_trees/class_tree.tsv
kraken2_db = %(database_path)s/kraken2_db/
EC_pathway = %(database_path)s/EC_pathway/EC_pathway.txt
path_to_superpath = %(database_path)s/path_to_superpath/pathway_to_superpathway.csv
MetaGeneMark_model = /pipeline_tools/mgm/MetaGeneMark_v1.mod

https://exon.gatech.edu/GeneMark/index.html

DGleason-680 commented 6 months ago

Thank you, but it is not clear to me what needs to be done for the license key in my situation.

I have a MetaGeneMark license key in my home directory (/home/dvan). Is it possible that it's labeled incorrectly? It is currently "gm_key_64".

billytaj commented 6 months ago

place the key as /home/dvan/.gm_key_64

note the "."

DGleason-680 commented 6 months ago

That seemed to work - thank you.

However, I'm still not having successful runs on the pipeline. Every time I rerun the same sample set (i.e., paired-end reads), the job fails at different stages of the pipeline. The last attempt ran for about 2 days then failed, with a return message in the sample_out.txt file: /home/dvan/scratch/project1/output/024-010B/rRNA_filter/data/jobs/pair_1_363_infernal_pp not found. kill the pipe. restart this stage

Any suggestions to remedy this issue?

DGleason-680 commented 5 months ago

Looking forward to any recommendations on how to get a successful run through the pipeline.

billytaj commented 5 months ago

the thing about the rRNA filter step is that we shard the reads, and send them all through infernal, hoping that your system has enough cores to make this step as painless as possible. But because there's a bunch of pieces flying around, we need to do some error-checking. This error says that pair_1, slice 363's infernal post-processing step failed to materialize anything.

a few things you can do: 1) check to see if this is a false positive error. If it didn't create something useful, re-run that specific sub-segment manually. If it did finish successfully, see <2> 2) bypass this error by adding in a job marker. jobs folder creates a bunch of empty files with specific names. we use these to track whether or not a parallel job has been completed.

DGleason-680 commented 5 months ago

I re-ran the job and now I'm back to having issues with the MGM license. Two jobs failed after ~80 hours because:

MGM did not produce a report. likely it didn't run
SPADes ran fine, but MGM failed. Check your MetaGeneMark license

I can confirm that I have ".gm_key_64" in my home directory.

$ cat .gm_key_64
AGATCAGACGAATCCACGAGGTACCCTACGTATGTTTTTTTTTTTTTTTTCACAGGCGCCCTTCAGATTCGGACGCCCCC
437719055

Perhaps it is the wrong license? Any support on this would be great and very appreciated so I can get some samples processed.

Also, is this an indication that the job was nearly complete? Is the MGM license part near the end of the workflow?

billytaj commented 5 months ago

It's at the end of the spades run. Mgm is used to split the contigs into segments with single genes.

Did you make sure your docker bind mount included the home directory where your mgm key should be?

DGleason-680 commented 5 months ago

This is from my job submission script: apptainer exec -B /home:/home $image python3 /pipeline/MetaPro.py --nhost -c $config -1 $read1 -2 $read2 --verbose_mode leave -o $output

However, my own directory is actually /home/dvan - and this is where the MGM license is located. So I will make this change (i.e., to "-B /home/dvan:/home") and hope it works. Thanks.

DGleason-680 commented 5 months ago

It's important to note that the MGM key has to be labelled ".gm_key".