Closed tacazares closed 1 year ago
Hey, I gave maxatac prepare this input:
`#!/bin/bash
module load anaconda3/1.0.0
source activate maxatac_test
maxatac prepare \ -i /data/miraldiLab/team/kessi/snakeATAC/output_dir/Th0_48h_r1/alignments/Th0_48h_r1.bam \ -o /data/miraldiLab/team/kessi/maxATAC \ -prefix Th0_48h_r1 \ --blacklist_bed /users/dia6sx/opt/maxatac/data/mm10/mm10_maxatac_blacklist.bed \ --blacklist_bw /users/dia6sx/opt/maxatac/data/mm10/mm10_maxatac_blacklist.bw \ --chrom_sizes /users/dia6sx/opt/maxatac/data/mm10/mm10.chrom.sizes \ -chroms chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19`
`[2022-07-15 16:28:03,370]
Input file: /data/miraldiLab/team/kessi/snakeATAC/output_dir/Th0_48h_r1/alignments/Th0_48h_r1.bam
Input chromosome sizes file: /users/dia6sx/opt/maxatac/data/mm10/mm10.chrom.sizes
Tn5 cut sites will be slopped 20 bps on each side
Input blacklist file: /users/dia6sx/opt/maxatac/data/hg38/hg38_maxatac_blacklist.bw
Output filename: Th0_48h_r1
Output directory: /data/miraldiLab/team/kessi/maxATAC
Using a millions factor of: 20000000
Using 12 threads to run job.
[2022-07-15 16:28:03,471]
Generate the normalized signal tracks.
[2022-07-15 16:28:03,471]
Working on a bulk ATAC-seq BAM file
Getting the number of reads in the BAM file
[2022-07-15 16:28:10,236]
Processing BAM to bigwig. Running eduplication
fixmate: invalid option -- '@'
Usage: samtools fixmate
As elsewhere in samtools, use '-' as the filename for stdin/stdout. The input
file must be grouped by read name (e.g. sorted by name). Coordinated sorted
input is not accepted.
index: invalid option -- '@'
Usage: samtools index [-bc] [-m INT]
There seems to be a problem when preparing a bam file from snakeATAC. Note I am able to bypass the prepare step by using the bw generated.
The
maxatac prepare
function was initially created as a convenience function for filtering, inferring Tn5 sites, and converting cut -site level coverage to min-max normalized bigwig tracks in one step. This function calls on the bash scripts that were used by our snakemake/cwl/bash workflows for ATAC-seq data processing. This will most likely be how most users prepare data as opposed to going through each step individually, so we should think about improving the user experience.maxatac prepare
to prepare scATAC-seq fragment files. He was able to run the script to completion and did not get an error message that: 1) A problem with bedGraphToBigWig finding the correct libssl library #1022) All of the expected output files were not produced.
The problem is that it appears to the user that the run completes correctly, despite having encountered some error during running the shell script. We should add more logging information during processing. We should also look into whether we should use python to execute the shell commands as opposed to just running a shell script from python with the commands internally. We could also add code to the shell script to catch problems with execution.
[x] We should also add a test for
maxatac
functions that will make sure pybigwig is installed and can find numpy correctly, before running through the entire workflow. This is related to #96. We should point to the fix if the issue is detected.[x] Double check that all of the unnecessary bedgraphs and intermediate files are removed to save space. We might want to add flags for whether to save specific intermediate files.
[x] Update and add better logging messages for different processes running. At least add messages for major events like saving files or removing files. We could also have a final printout that has the names and locations of all files and their sizes.