Closed dy-lin closed 1 year ago
it looks like the overall run worked because of the ACGT text files that have been put out.
@vincent6liu can you run with the test data as shown here? It looks like its failing at the step that you created, which is possibly due to only there being 2 cells?
I tried refreshing my python environment (create a conda environment, and then install mGATK with pip), and that seems to have helped. There are some warning messages, but looks like it completed.
This time using python 3.9.12.
Log files: base.mgatk.log
The snakemake file was too large to be uploaded, but here is the tail end:
/projects/karsanlab/dlin_dev/software/.conda/envs/mGATK3/lib/python3.9/site-packages/mgatk/bin/python/variant_calling.py:38: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
rev_base_df[missing_pos] = 0
/projects/karsanlab/dlin_dev/software/.conda/envs/mGATK3/lib/python3.9/site-packages/mgatk/bin/python/variant_calling.py:159: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
'mean_coverage', 'max_heteroplasmy']].astype(np.float)
/home/dlin/.conda/envs/mGATK3/lib/python3.9/site-packages/pandas/core/arraylike.py:397: RuntimeWarning: divide by zero encountered in log10
result = getattr(ufunc, method)(*inputs, **kwargs)
[Tue May 24 10:58:05 2022]
Finished job 5.
5 of 6 steps (83%) done
Select jobs to execute...
[Tue May 24 10:58:05 2022]
localrule all:
input: bc1dmem/final/bc1.depthTable.txt, bc1dmem/final/bc1.A.txt.gz, bc1dmem/final/bc1.C.txt.gz, bc1dmem/final/bc1.G.txt.gz, bc1dmem/final/bc1.T.txt.gz, bc1dmem/final/bc1.coverage.txt.gz, bc1dmem/final/bc1.variant_stats.tsv.gz, bc1dmem/final/bc1.cell_heteroplasmic_df.tsv.gz, bc1dmem/final/bc1.vmr_strand_plot.png
jobid: 0
reason: Input files updated by another job: bc1dmem/final/bc1.vmr_strand_plot.png, bc1dmem/final/bc1.coverage.txt.gz, bc1dmem/final/bc1.depthTable.txt, bc1dmem/final/bc1.T.txt.gz, bc1dmem/final/bc1.G.txt.gz, bc1dmem/final/bc1.C.txt.gz, bc1dmem/final/bc1.A.txt.gz, bc1dmem/final/bc1.variant_stats.tsv.gz, bc1dmem/final/bc1.cell_heteroplasmic_df.tsv.gz
resources: tmpdir=/var/tmp
[Tue May 24 10:58:05 2022]
Finished job 0.
6 of 6 steps (100%) done
Complete log: .snakemake/log/2022-05-24T105549.715466.snakemake.log
bc1dmem/:
total 12K
drwxrwsr-x 2 dlin karsanlab 4.0K May 24 10:58 final
drwxrwsr-x 4 dlin karsanlab 4.0K May 12 18:02 logs
drwxrwsr-x 4 dlin karsanlab 4.0K May 12 17:49 qc
bc1dmem/final:
total 1.8M
-rw-rw-r-- 1 dlin karsanlab 96K May 24 10:56 bc1.A.txt.gz
-rw-rw-r-- 1 dlin karsanlab 519 May 24 10:58 bc1.cell_heteroplasmic_df.tsv.gz
-rw-rw-r-- 1 dlin karsanlab 194K May 24 10:56 bc1.coverage.txt.gz
-rw-rw-r-- 1 dlin karsanlab 96K May 24 10:56 bc1.C.txt.gz
-rw-rw-r-- 1 dlin karsanlab 76 May 24 10:56 bc1.depthTable.txt
-rw-rw-r-- 1 dlin karsanlab 50K May 24 10:56 bc1.G.txt.gz
-rw-rw-r-- 1 dlin karsanlab 537K May 24 10:58 bc1.rds
-rw-rw-r-- 1 dlin karsanlab 477K May 24 10:58 bc1.signac.rds
-rw-rw-r-- 1 dlin karsanlab 77K May 24 10:56 bc1.T.txt.gz
-rw-rw-r-- 1 dlin karsanlab 77K May 24 10:58 bc1.variant_stats.tsv.gz
-rw-rw-r-- 1 dlin karsanlab 27K May 24 10:58 bc1.vmr_strand_plot.png
-rw-rw-r-- 1 dlin karsanlab 119K May 24 10:55 chrM_refAllele.txt
bc1dmem/logs:
total 34M
-rw-rw-r-- 1 dlin karsanlab 4.1K May 24 10:58 base.mgatk.log
-rw-rw-r-- 1 dlin karsanlab 514 May 24 10:55 bc1.parameters.txt
-rw-rw-r-- 1 dlin karsanlab 34M May 24 10:58 bc1.snakemake_tenx.log
-rw-rw-r-- 1 dlin karsanlab 8.8K May 24 10:58 bc1.snakemake_tenx.stats
drwxrwsr-x 2 dlin karsanlab 4.0K May 12 17:37 filterlogs
drwxrwsr-x 2 dlin karsanlab 4.0K May 12 17:37 rmdupslogs
bc1dmem/logs/filterlogs:
total 0
-rw-rw-r-- 1 dlin karsanlab 22 May 24 10:55 barcodes.1.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 24 10:55 barcodes.2.filter.log
bc1dmem/logs/rmdupslogs:
total 8.0K
-rw-rw-r-- 1 dlin karsanlab 1.5K May 24 10:55 barcodes.1.rmdups.log
-rw-rw-r-- 1 dlin karsanlab 1.5K May 24 10:55 barcodes.2.rmdups.log
bc1dmem/qc:
total 8.0K
drwxrwsr-x 2 dlin karsanlab 4.0K May 24 10:56 depth
drwxrwsr-x 2 dlin karsanlab 4.0K May 12 17:37 quality
bc1dmem/qc/depth:
total 0
-rw-rw-r-- 1 dlin karsanlab 50 May 24 10:56 barcodes.1.depth.txt
-rw-rw-r-- 1 dlin karsanlab 26 May 24 10:56 barcodes.2.depth.txt
bc1dmem/qc/quality:
total 0
@caleblareau do these errors affect final output files?
/projects/karsanlab/dlin_dev/software/.conda/envs/mGATK/lib/python3.9/site-packages/mgatk/bin/python/variant_calling.py:38: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
rev_base_df[missing_pos] = 0
/projects/karsanlab/dlin_dev/software/.conda/envs/mGATK/lib/python3.9/site-packages/mgatk/bin/python/variant_calling.py:159: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
'mean_coverage', 'max_heteroplasmy']].astype(np.float)
/home/dlin/.conda/envs/mGATK/lib/python3.9/site-packages/pandas/core/arraylike.py:397: RuntimeWarning: divide by zero encountered in log10
These shouldn't impact that final output files at all-- looks like you should be set! glad you were able to debug this.
Describe the bug
The tool does not throw a specific error and looks to have worked, but the logfile shows a traceback.
A summary of .log files
Post an ls -lRh of mgatk_output_folder
Describe the sequencing assay being analyzed This is the test dataset and command provided by mgatk.
Clarify if the execution successful on the test data provided in the repository
It does not work on the test data.
Additional context
Using python 3.7.3.