caleblareau / mgatk

mgatk: mitochondrial genome analysis toolkit
http://caleblareau.github.io/mgatk
MIT License
98 stars 25 forks source link

Improper folder specification #60

Closed dy-lin closed 1 year ago

dy-lin commented 2 years ago

Describe the bug

The command generates only the chrM_refAllele.txt.

mgatk tenx --input pbmc_granulocyte_sorted_3k_atac_possorted_bam.bam --output atac_rmdup --mito-genome hg38 --barcodes filtered_feature_bc_matrix/barcodes.tsv --barcode-tag CB --max-javamem=16000 

A summary of .log files

mgatk.snakemake_tenx.log

base.mgatk.log

Post an ls -lRh of mgatk_output_folder

atac_rmdup/:
total 12K
drwxrwsr-x 2 dlin karsanlab 4.0K May 13 11:22 final
drwxrwsr-x 4 dlin karsanlab 4.0K May 13 11:23 logs
drwxrwsr-x 3 dlin karsanlab 4.0K May 13 12:09 qc

atac_rmdup/final:
total 124K
-rw-rw-r-- 1 dlin karsanlab 119K May 13 12:07 chrM_refAllele.txt

atac_rmdup/logs:

total 252K
-rw-rw-r-- 1 dlin karsanlab  698 May 13 12:09 base.mgatk.log
drwxrwsr-x 2 dlin karsanlab  12K May 13 11:23 filterlogs
-rw-rw-r-- 1 dlin karsanlab  508 May 13 12:08 mgatk.parameters.txt
-rw-rw-r-- 1 dlin karsanlab 222K May 13 12:09 mgatk.snakemake_tenx.log
drwxrwsr-x 2 dlin karsanlab 4.0K May 13 11:23 rmdupslogs

atac_rmdup/logs/filterlogs:
total 0
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.100.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.101.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.102.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.103.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.104.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.105.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.106.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.107.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.108.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.109.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.10.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.110.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.111.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.112.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.113.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.114.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.115.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.116.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.117.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.118.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.119.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.11.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.120.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.121.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.122.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.123.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.124.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.125.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.126.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.127.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.128.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.129.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.12.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.130.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.131.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.132.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.133.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.134.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.135.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.136.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.137.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.138.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.139.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.13.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.140.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.141.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.142.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.143.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.14.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.15.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.16.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.17.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.18.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.19.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.1.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.20.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.21.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.22.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.23.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.24.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.25.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.26.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.27.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.28.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.29.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.2.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.30.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.31.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.32.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.33.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.34.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.35.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.36.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.37.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.38.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.39.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.3.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.40.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.41.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.42.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.43.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.44.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.45.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.46.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.47.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.48.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.49.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.4.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.50.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.51.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.52.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.53.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.54.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.55.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.56.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.57.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.58.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.59.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.5.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.60.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.61.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.62.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.63.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.64.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.65.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.66.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.67.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.68.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.69.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.6.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.70.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.71.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.72.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.73.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.74.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.75.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.76.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.77.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.78.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.79.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.7.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.80.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.81.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.82.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.83.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.84.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.85.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.86.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.87.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.88.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.89.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.8.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.90.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.91.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.92.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.93.filter.log
-rw-rw-r-- 1 dlin karsanlab 20 May 13 12:09 barcodes.94.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.95.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.96.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.97.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.98.filter.log
-rw-rw-r-- 1 dlin karsanlab 22 May 13 12:09 barcodes.99.filter.log
-rw-rw-r-- 1 dlin karsanlab 21 May 13 12:09 barcodes.9.filter.log

atac_rmdup/logs/rmdupslogs:
total 0

atac_rmdup/qc:
total 4.0K
drwxrwsr-x 2 dlin karsanlab 4.0K May 13 11:23 quality

atac_rmdup/qc/quality:
total 0

Describe the sequencing assay being analyzed

The assay is scATAC-seq from the 10X dataset. I'm directly downloading the BAM file, index, and barcode TSV from 10X. https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-3-k-1-standard-2-0-0

Clarify if the execution successful on the test data provided in the repository

It does not work on the test data (see #59)

Additional context

I've already tried solutions in other issues such as --snake-stdout and --keep-duplicates #46.

dy-lin commented 2 years ago

I was at one point able to run mGATK without errors, but had the same situation in #59 where files were missing. I suspect something in my PATH environment changed, e.g. pip or conda packages got updated, or even a different python3 executable.

caleblareau commented 2 years ago

can you double check the barcodes files? mgatk only wants one item per line (corresponding to one barcode) but I think the multiome barcodes.tsv file has multiple entries (for both the ATAC and RNA barcode)?

dy-lin commented 2 years ago

The barcodes.tsv file looks like this. The multiome shares barcodes between RNA and ATAC. I've chosen to use a public multiome dataset (https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-3-k-1-standard-2-0-0) for this run, so you can attempt to reproduce the error as well.

$ head barcodes.tsv 
AAACAGCCAAATATCC-1
AAACAGCCAGGAACTG-1
AAACAGCCAGGCTTCG-1
AAACCAACACCTGCTC-1
AAACCAACAGATTCAT-1
AAACCAACAGTTGCGT-1
AAACCAACATAACGGG-1
AAACCAACATAGACCC-1
AAACCGCGTGAGGTAG-1
AAACGCGCATACCCGG-1
dy-lin commented 2 years ago

Ideally, I'd like to get the entire tool working, but if I could reproduce previous behaviour where the ACGT files and RDS would be correctly outputted but missing other files, that would work for the meantime too.

dy-lin commented 2 years ago

Here's an OLD run where the silent error persisted (found via snapshots), but the files such as mgatk.depthTable.txt were written out successfully. Because this file was obtained via snapshots, I do not have the mgatk.snakemake_tenx.log file because it was overwritten.

2022-04-21T132444.896479.snakemake.log

Here's a run where the silent error persisted today.: 2022-05-20T125435.443790.snakemake.log and mgatk.snakemake_tenx.log

I managed to get a silent error today instead of the 'Improper folder specification', but after tinkering with my pip freeze to get an accurate list of ~/.local/bin, it stopped working and I am once again getting the "improper folder specification" error.

PeterCAllen commented 2 years ago

Hey Caleb,

If it helps, I'm getting the same thing about 'Improper Folder Specification' and have provided the snakemake log. I've deleted the .snakemake before starting another run but still running into the same issue. This was a DOGMA-Seq run – is it possible that the format of the atac bam is slightly different from just a standard scATAC run like you would with asap-seq?

Here are some of my barcodes for reference as well:

head ~/cluster/home/pcallen/projects/dogma_seq_pilot/data/20220429_dogma_run/counts_output/counts_mtDNA_masked/sample_3/outs/filtered_feature_bc_matrix/barcodes.tsv
AAACAGCCAACAGCCT-1
AAACAGCCACATAACT-1
AAACAGCCACATTAAC-1
AAACAGCCAGAATGAC-1
AAACAGCCAGGCATCT-1
AAACAGCCATCGTTCT-1
AAACATGCACCATATG-1
AAACATGCAGTAAAGC-1
AAACATGCATGGCCCA-1
AAACCAACAAATTCGT-1

sample3_test.snakemake_tenx.log

caleblareau commented 2 years ago

@PeterCAllen your error is a bit more tangible; I had to deal with this when we first had access to the multiome kit, and indeed there was an error in the release of CellRanger-ATAC / ARC v2 that we fixed in v0.6.3; can you double check your version.

@dy-lin I looked through your logs and couldn't see anything obvious other than your number of slices seem to be much larger than mine; can you try with -c 8 or so to limit the slice number? Otherwise, since this is the standard multiome run, there is very very low amounts of mtDNA so there may be some error with cells essentially have 0 mitochondrial data that is messing this up. Can you supply the -bc file with only the top 100 barcodes based on mitochondria perhaps?

PeterCAllen commented 2 years ago

Hey @caleblareau, thanks for the quick reply. My version of mgatk is v0.6.6. image In case it helps – here's the base log in addition to the parameter file. base.mgatk.log parameters.txt

dy-lin commented 2 years ago

@caleblareau if it helps the data was run with Cellranger ARC v2.0.1, and post-CellRanger ARC BAMs from the 10X website (https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-3-k-1-standard-2-0-0).

I tried using -c 8 and still run into the same error. I am concerned it has something to do with my installation (see #59) as it seems sometimes I am able to generate some files. Are you able to reproduce the error on a machine where you know the installation works?

dy-lin commented 2 years ago

Grabbing the top 100 barcodes:

samtools view pbmc_granulocyte_sorted_3k_atac_possorted_bam.bam "chrM" | grep -wf filtered_feature_bc_matrix/barcodes.tsv | grep -oE 'CB:[A-Z]:[ACGT]+-1' | cut -f3 -d: | sort | uniq -c | sort -nr | awk '{print $2}' | head -n 100 > barcodes.tsv

Still results in error: mgatk.snakemake_tenx.log

Is there anyway to not have the temporary files delete afterwards so we can see perhaps which barcodes are causing issues?

caleblareau commented 2 years ago

If you run with the -z flag, it keeps all of the temporary files; I’ll try to dig more into this later tonight / tomorrow.

On May 24, 2022, at 2:58 PM, Diana Lin @.**@.>> wrote:

Grabbing the top 100 barcodes:

samtools view pbmc_granulocyte_sorted_3k_atac_possorted_bam.bam "chrM" | grep -wf filtered_feature_bc_matrix/barcodes.tsv | grep -oE 'CB:[A-Z]:[ACGT]+-1' | cut -f3 -d: | sort | uniq -c | sort -nr | awk '{print $2}' | head -n 100 > barcodes.tsv

Still results in error: mgatk.snakemake_tenx.loghttps://github.com/caleblareau/mgatk/files/8766466/mgatk.snakemake_tenx.log

Is there anyway to not have the temporary files delete afterwards so we can see perhaps which barcodes are causing issues?

— Reply to this email directly, view it on GitHubhttps://github.com/caleblareau/mgatk/issues/60#issuecomment-1136469331, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD32FYP7HEAN3WVFCDAWTADVLVGG7ANCNFSM5V4H2GKQ. You are receiving this because you were mentioned.Message ID: @.***>

dy-lin commented 2 years ago

I tried looking at the BAMs, but since I'm not sure which line in process_one_slice is throwing the error, but all the barcodes listed in the .txt are found in the input BAM.

<< 03:16 PM >> [ dlin@gphost03: barcode_files ] $ samtools view ../barcoded_bams/barcodes.1.bam | grep -oE 'CB:[A-Z]:[ACGT]+-1' | cut -f3 -d: | sort | uniq -c | sort -nr | awk '{print $2, $1}'
AACGCCCAGTTTGAGC-1 43896
TGAAACTGTGACATAT-1 32254
ATCCACCTCGTCAAGT-1 29902
TAGCTAGGTACGCGCA-1 27611
TGTGGCCAGCGAGTAA-1 25288
GGGTGAAGTCACAAAT-1 22872
CCACTTGGTGGTTCTT-1 22550
TTTCGTCCAGAGGGAG-1 20324
TCAATCGCATGCTTAG-1 19950
TTAAGGACAGGTTCAC-1 19618
ACCAGGGAGCGCTCAA-1 19399
TCGTAATCAGCCTAAC-1 18466
TACTGCACAAACTAAG-1 17567
dy-lin commented 2 years ago

I managed to find out why using --keep-duplicates allowed me to get further but still eventual error-- the java heap size was too small. I had to increase it to -Xmx10000000. After adjusting that, I'm encountering this error:

Traceback (most recent call last):
  File "/projects/karsanlab/dlin_dev/MoHCC/KARSANBIO-3015_scATAC-seq_Pipeline/KARSANBIO-3034_Testing_mGATK/mgatk/mgatk/bin/python/sumstatsBPtenx.py", line 80, in <module>
    writeSparseMatrixLetter("A", 0)
  File "/projects/karsanlab/dlin_dev/MoHCC/KARSANBIO-3015_scATAC-seq_Pipeline/KARSANBIO-3034_Testing_mGATK/mgatk/mgatk/bin/python/sumstatsBPtenx.py", line 68, in writeSparseMatrixLetter
    with open(out_file_fn,"w") as file_handle_fn:
FileNotFoundError: [Errno 2] No such file or directory: '/projects/karsanlab/dlin_dev/MoHCC/KARSANBIO-3015_scATAC-seq_Pipeline/KARSANBIO-3034_Testing_mGATK/multiome/mgatk_out/temp/sparse_matrices/barcodes.1.A.txt'
dy-lin commented 2 years ago

@caleblareau After some investigation, I have managed to somewhat resolve my issue. I believe that some folders are not being created: /temp/sparse_matrices/ and /temp/qc/depth/

By adding these two lines after https://github.com/caleblareau/mgatk/blob/f0a27c1b57a27f3bf6af29c60bc89b35ea73c08b/mgatk/bin/snake/Snakefile.tenx#L75

                if not os.path.exists(outdir + "/temp/sparse_matrices/"):
                        os.mkdir(outdir + "/temp/sparse_matrices/")

By adding these two lines after https://github.com/caleblareau/mgatk/blob/f0a27c1b57a27f3bf6af29c60bc89b35ea73c08b/mgatk/bin/python/sumstatsBPtenx.py#L86

if not os.path.exists(os.path.dirname(out_file_depth)):
    os.mkdir(os.path.dirname(out_file_depth))

I was able to bypass these errors, and the files seem to be created correctly:

(mGATK) << 12:09 PM >> [ dlin@gphost03: multiome ] $ ls -lRh mgatk_out_fix/
mgatk_out_fix/:
total 20K
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:41 fasta
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:45 final
drwxrwsr-x 4 dlin karsanlab 4.0K May 30 11:44 logs
drwxrwsr-x 4 dlin karsanlab 4.0K May 30 11:41 qc
drwxrwsr-x 8 dlin karsanlab 4.0K May 30 11:41 temp

mgatk_out_fix/fasta:
total 20K
-rw-rw-r-- 1 dlin karsanlab 17K May 30 11:41 chrM.fasta
-rw-rw-r-- 1 dlin karsanlab  19 May 30 11:41 chrM.fasta.fai

mgatk_out_fix/final:
total 192M
-rw-rw-r-- 1 dlin karsanlab 119K May 30 11:41 chrM_refAllele.txt
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:43 mgatk.A.txt.gz
-rw-rw-r-- 1 dlin karsanlab 2.2K May 30 11:44 mgatk.cell_heteroplasmic_df.tsv.gz
-rw-rw-r-- 1 dlin karsanlab  36M May 30 11:43 mgatk.coverage.txt.gz
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:43 mgatk.C.txt.gz
-rw-rw-r-- 1 dlin karsanlab  64K May 30 11:42 mgatk.depthTable.txt
-rw-rw-r-- 1 dlin karsanlab 6.1M May 30 11:43 mgatk.G.txt.gz
-rw-rw-r-- 1 dlin karsanlab  44M May 30 11:45 mgatk.rds
-rw-rw-r-- 1 dlin karsanlab  69M May 30 11:45 mgatk.signac.rds
-rw-rw-r-- 1 dlin karsanlab  11M May 30 11:43 mgatk.T.txt.gz
-rw-rw-r-- 1 dlin karsanlab 145K May 30 11:44 mgatk.variant_stats.tsv.gz
-rw-rw-r-- 1 dlin karsanlab  28K May 30 11:44 mgatk.vmr_strand_plot.png

mgatk_out_fix/logs:
total 20M
-rw-rw-r-- 1 dlin karsanlab  373 May 30 11:45 base.mgatk.log
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:42 filterlogs
-rw-rw-r-- 1 dlin karsanlab  558 May 30 11:41 mgatk.parameters.txt
-rw-rw-r-- 1 dlin karsanlab  20M May 30 11:44 mgatk.snakemake_tenx.log
-rw-rw-r-- 1 dlin karsanlab  23K May 30 11:44 mgatk.snakemake_tenx.stats
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:42 rmdupslogs

mgatk_out_fix/logs/filterlogs:
total 0
-rw-rw-r-- 1 dlin karsanlab 24 May 30 11:42 barcodes.1.filter.log
-rw-rw-r-- 1 dlin karsanlab 24 May 30 11:42 barcodes.2.filter.log
-rw-rw-r-- 1 dlin karsanlab 24 May 30 11:42 barcodes.3.filter.log
-rw-rw-r-- 1 dlin karsanlab 24 May 30 11:42 barcodes.4.filter.log
-rw-rw-r-- 1 dlin karsanlab 24 May 30 11:42 barcodes.5.filter.log
-rw-rw-r-- 1 dlin karsanlab 24 May 30 11:42 barcodes.6.filter.log
-rw-rw-r-- 1 dlin karsanlab 24 May 30 11:42 barcodes.7.filter.log
-rw-rw-r-- 1 dlin karsanlab 24 May 30 11:42 barcodes.8.filter.log

mgatk_out_fix/logs/rmdupslogs:
total 32K
-rw-rw-r-- 1 dlin karsanlab 2.9K May 30 11:42 barcodes.1.rmdups.log
-rw-rw-r-- 1 dlin karsanlab 2.9K May 30 11:42 barcodes.2.rmdups.log
-rw-rw-r-- 1 dlin karsanlab 2.9K May 30 11:42 barcodes.3.rmdups.log
-rw-rw-r-- 1 dlin karsanlab 2.9K May 30 11:42 barcodes.4.rmdups.log
-rw-rw-r-- 1 dlin karsanlab 2.9K May 30 11:42 barcodes.5.rmdups.log
-rw-rw-r-- 1 dlin karsanlab 2.9K May 30 11:42 barcodes.6.rmdups.log
-rw-rw-r-- 1 dlin karsanlab 2.8K May 30 11:42 barcodes.7.rmdups.log
-rw-rw-r-- 1 dlin karsanlab 2.9K May 30 11:42 barcodes.8.rmdups.log

mgatk_out_fix/qc:
total 8.0K
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:42 depth
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:41 quality

mgatk_out_fix/qc/depth:
total 64K
-rw-rw-r-- 1 dlin karsanlab 8.0K May 30 11:42 barcodes.1.depth.txt
-rw-rw-r-- 1 dlin karsanlab 8.0K May 30 11:42 barcodes.2.depth.txt
-rw-rw-r-- 1 dlin karsanlab 8.0K May 30 11:42 barcodes.3.depth.txt
-rw-rw-r-- 1 dlin karsanlab 8.0K May 30 11:42 barcodes.4.depth.txt
-rw-rw-r-- 1 dlin karsanlab 8.0K May 30 11:42 barcodes.5.depth.txt
-rw-rw-r-- 1 dlin karsanlab 8.0K May 30 11:42 barcodes.6.depth.txt
-rw-rw-r-- 1 dlin karsanlab 8.0K May 30 11:42 barcodes.7.depth.txt
-rw-rw-r-- 1 dlin karsanlab 7.9K May 30 11:42 barcodes.8.depth.txt

mgatk_out_fix/qc/quality:
total 0

mgatk_out_fix/temp:
total 24K
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:41 barcoded_bams
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:41 barcode_files
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:41 quality
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:42 ready_bam
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:42 sparse_matrices
drwxrwsr-x 2 dlin karsanlab 4.0K May 30 11:42 temp_bam

mgatk_out_fix/temp/barcoded_bams:
total 108M
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:41 barcodes.1.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:41 barcodes.1.bam.bai
-rw-rw-r-- 1 dlin karsanlab  15M May 30 11:41 barcodes.2.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:41 barcodes.2.bam.bai
-rw-rw-r-- 1 dlin karsanlab  12M May 30 11:41 barcodes.3.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:41 barcodes.3.bam.bai
-rw-rw-r-- 1 dlin karsanlab  11M May 30 11:41 barcodes.4.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:41 barcodes.4.bam.bai
-rw-rw-r-- 1 dlin karsanlab  13M May 30 11:41 barcodes.5.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:41 barcodes.5.bam.bai
-rw-rw-r-- 1 dlin karsanlab  17M May 30 11:41 barcodes.6.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:41 barcodes.6.bam.bai
-rw-rw-r-- 1 dlin karsanlab  16M May 30 11:41 barcodes.7.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:41 barcodes.7.bam.bai
-rw-rw-r-- 1 dlin karsanlab  13M May 30 11:41 barcodes.8.bam
-rw-rw-r-- 1 dlin karsanlab 1.7K May 30 11:41 barcodes.8.bam.bai

mgatk_out_fix/temp/barcode_files:
total 64K
-rw-rw-r-- 1 dlin karsanlab 6.3K May 30 11:41 barcodes.1.txt
-rw-rw-r-- 1 dlin karsanlab 6.3K May 30 11:41 barcodes.2.txt
-rw-rw-r-- 1 dlin karsanlab 6.3K May 30 11:41 barcodes.3.txt
-rw-rw-r-- 1 dlin karsanlab 6.3K May 30 11:41 barcodes.4.txt
-rw-rw-r-- 1 dlin karsanlab 6.3K May 30 11:41 barcodes.5.txt
-rw-rw-r-- 1 dlin karsanlab 6.3K May 30 11:41 barcodes.6.txt
-rw-rw-r-- 1 dlin karsanlab 6.3K May 30 11:41 barcodes.7.txt
-rw-rw-r-- 1 dlin karsanlab 6.3K May 30 11:41 barcodes.8.txt

mgatk_out_fix/temp/quality:
total 0

mgatk_out_fix/temp/ready_bam:
total 80M
-rw-rw-r-- 1 dlin karsanlab 9.9M May 30 11:42 barcodes.1.qc.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:42 barcodes.1.qc.bam.bai
-rw-rw-r-- 1 dlin karsanlab  11M May 30 11:42 barcodes.2.qc.bam
-rw-rw-r-- 1 dlin karsanlab 1.7K May 30 11:42 barcodes.2.qc.bam.bai
-rw-rw-r-- 1 dlin karsanlab 8.9M May 30 11:42 barcodes.3.qc.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:42 barcodes.3.qc.bam.bai
-rw-rw-r-- 1 dlin karsanlab 8.1M May 30 11:42 barcodes.4.qc.bam
-rw-rw-r-- 1 dlin karsanlab 1.7K May 30 11:42 barcodes.4.qc.bam.bai
-rw-rw-r-- 1 dlin karsanlab 9.7M May 30 11:42 barcodes.5.qc.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:42 barcodes.5.qc.bam.bai
-rw-rw-r-- 1 dlin karsanlab  12M May 30 11:42 barcodes.6.qc.bam
-rw-rw-r-- 1 dlin karsanlab 1.7K May 30 11:42 barcodes.6.qc.bam.bai
-rw-rw-r-- 1 dlin karsanlab  12M May 30 11:42 barcodes.7.qc.bam
-rw-rw-r-- 1 dlin karsanlab 1.7K May 30 11:42 barcodes.7.qc.bam.bai
-rw-rw-r-- 1 dlin karsanlab 9.1M May 30 11:42 barcodes.8.qc.bam
-rw-rw-r-- 1 dlin karsanlab 1.7K May 30 11:42 barcodes.8.qc.bam.bai

mgatk_out_fix/temp/sparse_matrices:
total 657M
-rw-rw-r-- 1 dlin karsanlab  12M May 30 11:42 barcodes.1.A.txt
-rw-rw-r-- 1 dlin karsanlab  36M May 30 11:42 barcodes.1.coverage.txt
-rw-rw-r-- 1 dlin karsanlab  13M May 30 11:42 barcodes.1.C.txt
-rw-rw-r-- 1 dlin karsanlab 5.1M May 30 11:42 barcodes.1.G.txt
-rw-rw-r-- 1 dlin karsanlab 9.5M May 30 11:42 barcodes.1.T.txt
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.2.A.txt
-rw-rw-r-- 1 dlin karsanlab  40M May 30 11:42 barcodes.2.coverage.txt
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.2.C.txt
-rw-rw-r-- 1 dlin karsanlab 5.7M May 30 11:42 barcodes.2.G.txt
-rw-rw-r-- 1 dlin karsanlab  11M May 30 11:42 barcodes.2.T.txt
-rw-rw-r-- 1 dlin karsanlab  13M May 30 11:42 barcodes.3.A.txt
-rw-rw-r-- 1 dlin karsanlab  39M May 30 11:42 barcodes.3.coverage.txt
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.3.C.txt
-rw-rw-r-- 1 dlin karsanlab 5.5M May 30 11:42 barcodes.3.G.txt
-rw-rw-r-- 1 dlin karsanlab  11M May 30 11:42 barcodes.3.T.txt
-rw-rw-r-- 1 dlin karsanlab  13M May 30 11:42 barcodes.4.A.txt
-rw-rw-r-- 1 dlin karsanlab  37M May 30 11:42 barcodes.4.coverage.txt
-rw-rw-r-- 1 dlin karsanlab  13M May 30 11:42 barcodes.4.C.txt
-rw-rw-r-- 1 dlin karsanlab 5.2M May 30 11:42 barcodes.4.G.txt
-rw-rw-r-- 1 dlin karsanlab 9.7M May 30 11:42 barcodes.4.T.txt
-rw-rw-r-- 1 dlin karsanlab  13M May 30 11:42 barcodes.5.A.txt
-rw-rw-r-- 1 dlin karsanlab  40M May 30 11:42 barcodes.5.coverage.txt
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.5.C.txt
-rw-rw-r-- 1 dlin karsanlab 5.6M May 30 11:42 barcodes.5.G.txt
-rw-rw-r-- 1 dlin karsanlab  11M May 30 11:42 barcodes.5.T.txt
-rw-rw-r-- 1 dlin karsanlab  15M May 30 11:42 barcodes.6.A.txt
-rw-rw-r-- 1 dlin karsanlab  45M May 30 11:42 barcodes.6.coverage.txt
-rw-rw-r-- 1 dlin karsanlab  16M May 30 11:42 barcodes.6.C.txt
-rw-rw-r-- 1 dlin karsanlab 6.3M May 30 11:42 barcodes.6.G.txt
-rw-rw-r-- 1 dlin karsanlab  12M May 30 11:42 barcodes.6.T.txt
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.7.A.txt
-rw-rw-r-- 1 dlin karsanlab  40M May 30 11:42 barcodes.7.coverage.txt
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.7.C.txt
-rw-rw-r-- 1 dlin karsanlab 5.7M May 30 11:42 barcodes.7.G.txt
-rw-rw-r-- 1 dlin karsanlab  11M May 30 11:42 barcodes.7.T.txt
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.8.A.txt
-rw-rw-r-- 1 dlin karsanlab  41M May 30 11:42 barcodes.8.coverage.txt
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.8.C.txt
-rw-rw-r-- 1 dlin karsanlab 5.8M May 30 11:42 barcodes.8.G.txt
-rw-rw-r-- 1 dlin karsanlab  11M May 30 11:42 barcodes.8.T.txt

mgatk_out_fix/temp/temp_bam:
total 217M
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.1.temp0.bam
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.1.temp1.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:42 barcodes.1.temp1.bam.bai
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.2.temp0.bam
-rw-rw-r-- 1 dlin karsanlab  15M May 30 11:42 barcodes.2.temp1.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:42 barcodes.2.temp1.bam.bai
-rw-rw-r-- 1 dlin karsanlab  12M May 30 11:42 barcodes.3.temp0.bam
-rw-rw-r-- 1 dlin karsanlab  13M May 30 11:42 barcodes.3.temp1.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:42 barcodes.3.temp1.bam.bai
-rw-rw-r-- 1 dlin karsanlab  11M May 30 11:42 barcodes.4.temp0.bam
-rw-rw-r-- 1 dlin karsanlab  11M May 30 11:42 barcodes.4.temp1.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:42 barcodes.4.temp1.bam.bai
-rw-rw-r-- 1 dlin karsanlab  13M May 30 11:42 barcodes.5.temp0.bam
-rw-rw-r-- 1 dlin karsanlab  14M May 30 11:42 barcodes.5.temp1.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:42 barcodes.5.temp1.bam.bai
-rw-rw-r-- 1 dlin karsanlab  17M May 30 11:42 barcodes.6.temp0.bam
-rw-rw-r-- 1 dlin karsanlab  17M May 30 11:42 barcodes.6.temp1.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:42 barcodes.6.temp1.bam.bai
-rw-rw-r-- 1 dlin karsanlab  16M May 30 11:42 barcodes.7.temp0.bam
-rw-rw-r-- 1 dlin karsanlab  16M May 30 11:42 barcodes.7.temp1.bam
-rw-rw-r-- 1 dlin karsanlab 1.8K May 30 11:42 barcodes.7.temp1.bam.bai
-rw-rw-r-- 1 dlin karsanlab  13M May 30 11:42 barcodes.8.temp0.bam
-rw-rw-r-- 1 dlin karsanlab  13M May 30 11:42 barcodes.8.temp1.bam
-rw-rw-r-- 1 dlin karsanlab 1.7K May 30 11:42 barcodes.8.temp1.bam.bai

The issue that remains is that the snakemake logfile still seems to think something is wrong, and thinks the output files aren't being produced when they are (see the ls -lhR above): mgatk.snakemake_tenx.log

[Mon May 30 11:41:55 2022]
rule process_one_slice:
    input: mgatk_out_fix/.internal/samples/barcodes.2.bam.txt
    output: mgatk_out_fix/qc/depth/barcodes.2.depth.txt, mgatk_out_fix/temp/sparse_matrices/barcodes.2.A.txt, mgatk_out_fix/temp/sparse_m
atrices/barcodes.2.C.txt, mgatk_out_fix/temp/sparse_matrices/barcodes.2.G.txt, mgatk_out_fix/temp/sparse_matrices/barcodes.2.T.txt, mgatk
_out_fix/temp/sparse_matrices/barcodes.2.coverage.txt
    jobid: 3
    reason: Missing output files: mgatk_out_fix/temp/sparse_matrices/barcodes.2.G.txt, mgatk_out_fix/qc/depth/barcodes.2.depth.txt, mgatk
_out_fix/temp/sparse_matrices/barcodes.2.coverage.txt, mgatk_out_fix/temp/sparse_matrices/barcodes.2.A.txt, mgatk_out_fix/temp/sparse_mat
rices/barcodes.2.C.txt, mgatk_out_fix/temp/sparse_matrices/barcodes.2.T.txt
    wildcards: sample=barcodes.2
    resources: tmpdir=/var/tmp

And again, the 'error messages' from #59 also persist here:

/projects/karsanlab/dlin_dev/software/.conda/envs/mGATK/lib/python3.9/site-packages/mgatk/bin/python/variant_calling.py:38: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  rev_base_df[missing_pos] = 0
/projects/karsanlab/dlin_dev/software/.conda/envs/mGATK/lib/python3.9/site-packages/mgatk/bin/python/variant_calling.py:159: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  'mean_coverage', 'max_heteroplasmy']].astype(np.float)
/home/dlin/.conda/envs/mGATK/lib/python3.9/site-packages/pandas/core/arraylike.py:397: RuntimeWarning: divide by zero encountered in log10
  result = getattr(ufunc, method)(*inputs, **kwargs)

So I guess my final questions are:

  1. Are the directories being created elsewhere, but because of the specific flags I've used, that part of the code got skipped somewhere? Either way I can create a pull request if needed.
  2. Do these runtime errors and warnings affect the output files?
  3. Why does Snakemake think that the output files were not created when they were?
caleblareau commented 2 years ago

@dy-lin thanks for the effort here :) just so I can confirm, did you rerun with the same output folder already existing, or did you still get these errors (1-3) if the output folder didn't exist? That's the only way off the top of my head I could imagine there being an issue that would have caused this error.

dy-lin commented 2 years ago

I'll re-run with an existing output directory and without one and see if that clears the snakemake log then I'll report back!

dy-lin commented 2 years ago

Looks like in either case, snakemake is still registering that files are missing even though they aren't. However, I discovered a quirk in installation. If I install all the dependencies (pysam=0.19.0, numpy, matplotlib) first using conda, and then install mGATK using conda's pip, mGATK does not run into the 'improper folder specification' error (without any of the code modifications I referred to above). ¯\_(ツ)_/¯

ollieeknight commented 1 year ago

yeah, I'm now running into this issue:

[Mon Apr 17 21:20:00 2023]
rule process_one_slice:
    input: INT1_DOGMA/mgatk/.internal/samples/barcodes.19.bam.txt
    output: INT1_DOGMA/mgatk/qc/depth/barcodes.19.depth.txt, INT1_DOGMA/mgatk/temp/sparse_matrices/barcodes.19.A.txt, INT1_DOGMA/mgatk/temp/sparse_matrices/barcodes.19.C.txt, INT1_DOGMA/mgatk/temp/sparse_matrices/barcodes.19.G.txt, INT1_DOGMA/mgatk/temp/sparse_matrices/barcodes.19.T.txt, INT1_DOGMA/mgatk/temp/sparse_matrices/barcodes.19.coverage.txt
    jobid: 28
    reason: Missing output files: INT1_DOGMA/mgatk/temp/sparse_matrices/barcodes.19.coverage.txt, INT1_DOGMA/mgatk/qc/depth/barcodes.19.depth.txt, INT1_DOGMA/mgatk/temp/sparse_matrices/barcodes.19.T.txt, INT1_DOGMA/mgatk/temp/sparse_matrices/barcodes.19.G.txt, INT1_DOGMA/mgatk/temp/sparse_matrices/barcodes.19.A.txt, INT1_DOGMA/mgatk/temp/sparse_matrices/barcodes.19.C.txt; Code has changed since last execution
    wildcards: sample=barcodes.19
    resources: tmpdir=/fast/users/knighto_c/scratch/tmp

RuleException:
SamtoolsError in file /fast/work/users/knighto_c/bin/miniconda3/envs/mito/lib/python3.9/site-packages/mgatk/bin/snake/Snakefile.tenx, line 113:
'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "INT1_DOGMA/mgatk/temp/ready_bam/barcodes.15.qc.bam"\n'
  File "/fast/work/users/knighto_c/bin/miniconda3/envs/mito/lib/python3.9/site-packages/mgatk/bin/snake/Snakefile.tenx", line 113, in __rule_process_one_slice
  File "/data/gpfs-1/users/knighto_c/work/bin/miniconda3/envs/mito/lib/python3.9/site-packages/pysam/utils.py", line 69, in __call__
  File "/data/gpfs-1/users/knighto_c/work/bin/miniconda3/envs/mito/lib/python3.9/concurrent/futures/thread.py", line 58, in run

and then eventually,

Error in checkGrep(grep(".A.txt", files)) :
  Improper folder specification; file missing / extra file present. See documentation
Calls: importMito -> checkGrep
Execution halted

@dy-lin, could you post the exact commands you used to create the mgatk environment, and then run genotyping? thanks so much!

ollieeknight commented 1 year ago

alright, it's not very helpful as I can't fully figure out what's going on, but this is how I solved it:

# clean miniconda3 installation
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p miniconda3 && rm Miniconda3-latest-Linux-x86_64.sh
source miniconda3/etc/profile.d/conda.sh && conda activate
conda install mamba

# create mgatk env
mamba create -y -n mito openjdk r-data.table r-matrix bioconductor-genomicranges bioconductor-summarizedexperiment
conda activate mito
pip install mgatk

# DOGMA-seq run, so barcodes needs to be gunzipped
sample_id=folder_name
cp $sample_id/outs/filtered_feature_bc_matrix/barcodes.tsv.gz $sample_id/outs/filtered_feature_bc_matrix/barcodes1.tsv.gz
gunzip $sample_id/outs/filtered_feature_bc_matrix/barcodes1.tsv.gz
mgatk tenx -i $sample_id/outs/atac_possorted_bam.bam -n $sample_id -o $sample_id/mgatk -c 8 -bt CB -b $sample_id/outs/filtered_feature_bc_matrix/barcodes1.tsv -z
rm $sample_id/outs/filtered_feature_bc_matrix/barcodes1.tsv

The critical part I found here is only specifying 8 cores for mgatk tenx.