bhattlab / MGEfinder

A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
MIT License
105 stars 16 forks source link

Error with test dataset #8

Closed aflores18 closed 4 years ago

aflores18 commented 4 years ago

Hello!

I'm interested in using mgefinder on our datasets and followed instructions to install through conda per the guide. I downloaded and extracted the test_workdir files as instructed. I set the environment appropriately for mgefinder in conda and invoked the following command:

$ mgefinder workflow --cores 20 --memory 100000 test_workdir/

However, it appears to have crashed with the following error:


Traceback (most recent call last): File "/home/user/miniconda3/envs/mgefinder/bin/mgefinder", line 8, in sys.exit(cli()) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke return callback(args, kwargs) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 251, in genotype _genotype(clusterseq, pairfiles, filter_clusters_inferred_assembly, output_file) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 37, in _genotype genotypes = genotyper.genotype() File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 106, in genotype genotypes = self.resolve_ambiguous_genotypes(genotypes) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 224, in resolve_ambiguous_genotypes unresolved, cluster_counts_per_site File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 322, in resolve_all_sample_comparison resolved = (pd.merge(unresolved, cluster_counts, how='inner', on=['contig', 'pos_5p', 'pos_3p', 'cluster']). File "/home/user/.local/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 61, in merge validate=validate) File "/home/user/.local/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 555, in init self._maybe_coerce_merge_keys() File "/home/user/.local/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 986, in _maybe_coerce_merge_keys raise ValueError(msg) ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat Error in job genotype while creating output file test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv. RuleException: CalledProcessError in line 286 of /home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile: Command ' if [ "True" == "True" ]; then mgefinder genotype --filter-clusters-inferred-assembly test_workdir/03.results/efae_GCF_900639545/01.clusterseq.efae_GCF_900639545.tsv test_workdir/01.mgefinder/efae_GCF_900639545/efae_GCF_900639545.all_pair.txt -o test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv 1> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log 2> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err || (cat test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err; exit 1) else mgefinder genotype --no-filter-clusters-inferred-assembly test_workdir/03.results/efae_GCF_900639545/01.clusterseq.efae_GCF_900639545.tsv test_workdir/01.mgefinder/efae_GCF_900639545/efae_GCF_900639545.all_pair.txt -o test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv 1> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log 2> test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err || (cat test_workdir/03.results/efae_GCF_900639545/log/efae_GCF_900639545.genotype.log.err; exit 1) fi ' returned non-zero exit status 1. File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile", line 286, in __rule_genotype File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/concurrent/futures/thread.py", line 56, in run Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message Traceback (most recent call last): File "/home/user/miniconda3/envs/mgefinder/bin/mgefinder", line 8, in sys.exit(cli()) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke return callback(args, kwargs) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 51, in workflow _workflow(workdir, snakefile, configfile, cores, memory, unlock, rerun_incomplete, keep_going) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow.py", line 26, in _workflow shell(cmd) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/snakemake/shell.py", line 88, in new raise sp.CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'snakemake -s /home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/Snakefile --config wd=test_workdir/ memory=16000 --cores 20 --configfile /home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/config.yml ' returned non-zero exit status 1.


Obviously, would like to get the test dataset to run appropriately before trying on our own data. Most likely in my experience this is something simple but my relative inexperience leaves me baffled at this time.

Any suggestions are most welcome.

Tony

durrantmm commented 4 years ago

Hello, thanks for bringing up this bug. Let's try to get that working. Couple questions for you, did you try doing this without the additional --cores 20 --memory 100000 commands? Also, how far did it get into the test? At what point did it fail?

Thanks.

aflores18 commented 4 years ago

Thanks for the quick response. Yes, I ran without the specified cores or memory and resulted in the same error. Below is the log from the generated results folder:

CHECKING DEPENDENCIES

Current version of snakemake: 3.13.3 Expected version of snakemake: 3.13.3 Current version of einverted: EMBOSS:6.6.0.0 Expected version of einverted: EMBOSS:6.6.0.0 Current version of bowtie2: 2.3.5 Expected version of bowtie2: 2.3.5 Current version of samtools: 1.9 Expected version of samtools: 1.9 Current version of cd-hit: 4.8.1 Expected version of cd-hit: 4.8.1 ###############################

PARAMETERS

command: genotype clusterseq: test_workdir/03.results/efae_GCF_900639545/01.clusterseq.efae_GCF_900639545.tsv pairfiles: ('test_workdir/01.mgefinder/efae_GCF_900639545/efae_GCF_900639545.all_pair.txt',) filter_clusters_inferred_assembly: True output_file: test_workdir/03.results/efae_GCF_900639545/02.genotype.efae_GCF_900639545.tsv #################### Loading clusterseq file... Parsing pair files Loading pair files... Loading file 1/10: test_workdir/01.mgefinder/efae_GCF_900639545/ERR1036032/02.pair.ERR1036032.efae_GCF_900639545.tsv Loading file 2/10: test_workdir/01.mgefinder/efae_GCF_900639545/ERR1078789/02.pair.ERR1078789.efae_GCF_900639545.tsv Loading file 3/10: test_workdir/01.mgefinder/efae_GCF_900639545/ERR1541922/02.pair.ERR1541922.efae_GCF_900639545.tsv Loading file 4/10: test_workdir/01.mgefinder/efae_GCF_900639545/ERR1036049/02.pair.ERR1036049.efae_GCF_900639545.tsv Loading file 5/10: test_workdir/01.mgefinder/efae_GCF_900639545/ERR1541932/02.pair.ERR1541932.efae_GCF_900639545.tsv Loading file 6/10: test_workdir/01.mgefinder/efae_GCF_900639545/ERR1078777/02.pair.ERR1078777.efae_GCF_900639545.tsv Loading file 7/10: test_workdir/01.mgefinder/efae_GCF_900639545/ERR1036051/02.pair.ERR1036051.efae_GCF_900639545.tsv Loading file 8/10: test_workdir/01.mgefinder/efae_GCF_900639545/ERR1541798/02.pair.ERR1541798.efae_GCF_900639545.tsv Loading file 9/10: test_workdir/01.mgefinder/efae_GCF_900639545/ERR1195862/02.pair.ERR1195862.efae_GCF_900639545.tsv Loading file 10/10: test_workdir/01.mgefinder/efae_GCF_900639545/ERR1541854/02.pair.ERR1541854.efae_GCF_900639545.tsv Filtering out clusters that are never inferred from an assembly... Excluding 6 clusters that were only inferred from the reference genome... Out of 585 candidate insertions, 407 had some inferred identity, while 178 had no inferred identity. Assigning initial genotypes... Identifying ambiguous genotypes... Resolving ambiguous genotypes where possible...


And here are the contents of the ".err" in the same folder:

Traceback (most recent call last): File "/home/user/miniconda3/envs/mgefinder/bin/mgefinder", line 8, in sys.exit(cli()) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke return callback(args, **kwargs) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 251, in genotype _genotype(clusterseq, pairfiles, filter_clusters_inferred_assembly, output_file) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 37, in _genotype genotypes = genotyper.genotype() File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 106, in genotype genotypes = self.resolve_ambiguous_genotypes(genotypes) File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 224, in resolve_ambiguous_genotypes unresolved, cluster_counts_per_site File "/home/user/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/genotype.py", line 322, in resolve_all_sample_comparison resolved = (pd.merge(unresolved, cluster_counts, how='inner', on=['contig', 'pos_5p', 'pos_3p', 'cluster']). File "/home/user/.local/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 61, in merge validate=validate) File "/home/user/.local/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 555, in init self._maybe_coerce_merge_keys() File "/home/user/.local/lib/python3.6/site-packages/pandas/core/reshape/merge.py", line 986, in _maybe_coerce_merge_keys raise ValueError(msg) ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat


It appears to have stopped during the clusterseq. If there are additional output that would be helpful for you please let me now.

Tony

durrantmm commented 4 years ago

Thank you, this is strange, it may have something to do with a version error for the dependencies. I will work on correcting this. But in the meantime, you'll want to check if all of these version dependencies hold true:

python = 3.6.9 click = 7.0 pandas = 0.25.3 biopython = 1.75 pysam = 0.15.3 scipy = 1.4.0 networkx = 2.4 tqdm = 4.40.2

aflores18 commented 4 years ago

Great. Here are the versions of the above dependencies currently installed:

python - 3.6.9 click - 7.0 pandas - 0.25.3 biopython - 1.75 pysam - 0.16.0 (this is the only version not matching) scipy - 1.4.0 networkx - 2.4 tqdm - 4.40.2

durrantmm commented 4 years ago

Ok, try installing the correct pysam version with pip install pysam==0.15.3

durrantmm commented 4 years ago

Ok, I changed the setup.py file to include version requirements, should work now if you uninstall the environment with conda env remove -n mgefinder and then rerun bash install.sh.

aflores18 commented 4 years ago

That did it. It finished the test dataset without errors. I will work on our data.

Thanks for your help!