databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
54 stars 15 forks source link

PEPATACr::createStatsSummary Error #209

Open hadgie opened 2 years ago

hadgie commented 2 years ago

Hello, I'm completely new to this field and I have some atac-seq data to process. After cloning your git and installing everything, I've tried to execute the examples/test_project/test_config.yaml script with looper, but keep getting the same error which is:

PEPATACr::createStatsSummary(project_samples, results_subdir): Stats files missing for 1 samples.

It might be a basic mistake, but your answers could be a big help.

jpsmith5 commented 2 years ago

Hey @hadgie, Can you share the specific command you ran? Was it looper runp examples/test_project/test_config.yaml or looper report examples/test_project/test_config.yaml? Had you already successfully run the looper run command on the config file? You can also share the PEPATAC_log.md file with me and I can look at that too. To drag and drop that in github here you'll have to change the extension to .txt first I believe to be able to share it.

hadgie commented 2 years ago

Thanks for your reply @jpsmith5, the command I ran was looper runp examples/test_project/test_config.yaml. I just tried looper run and following error was shown:

MacBookPro test_project % looper run /Users/heesuk/Desktop/atac/pepatac/examples/test_project/test_config.yaml   
Looper version: 1.3.1
Command: run
## [1 of 1] sample: test1; pipeline: PEPATAC
2 input files missing, job input size was not calculated accurately
> Not submitted: Missing files: ['examples/data/test1_r1.fastq.gz']
Writing 1 submission scripts for skipped samples

But when I head to examples/data/ , I have the fastq.gz files:

test1_r1.fastq.gz   test1_r2.fastq.gz

In which directory can I find the PEPATAC_log.md ?

hadgie commented 2 years ago

PEPATAC_collator_log.md

I could not find PEPATAC_log.md , but this might be what you were looking for! I'm not sure though..

jpsmith5 commented 2 years ago

Hey @hadgie, Gotcha. Yeah, you'll need to perform the run step before the runp command. That's why it didn't see the stats.tsv file for the sample. The test configuration file also expects you to run it from within the pepatac/ folder. So try changing directory to your /Users/heesuk/Desktop/atac/pepatac/ folder and do the looper run examples/test_project/test_config.yaml command. Once that completes, then you can do runp and let me know if you hit a snag.

hadgie commented 2 years ago

Wow thanks to your help, I managed to run the script partly! Unfortunately now I have a different error with the following message:

psutil.ZombieProcess process still exists but it's a zombie (pid=20668)
Warning: couldn't add memory use for process: 20668

is this an issue with my hardware? Is there a way I can fix this issue? I am working on an i9 macbook pro with 16g memory.

jpsmith5 commented 2 years ago

Hmm, 16g should be more than enough, certainly for the test sample. The test sample in my hands maxed out at 6.5g of memory use, and that peak use comes from MACS2. Can you see at what point in the process this popped up? Was it during peak calling? It does suggest an insufficiency in memory availability but I'm not immediately clear where or why. You can certainly pass along your PEPATAC_log.md file and I'll look through that too. It would be in the pepatac_test/results_pipeline/test1/ folder.

hadgie commented 2 years ago

PEPATAC_log.md

Thanks, I finally found the log file haha I'm not sure but maybe the memory error is occurred during the alignment process?

jpsmith5 commented 2 years ago

Can you also share this file: /Users/heesuk/Desktop/atac/pepatac/pepatac_test/results_pipeline/test1/prealignments/test1_rCRSd_bt_aln_summary.log

I'm wondering if there's an error in bowtie2. Any messages from bowtie2 should end up in that file, and it can sometimes identify issues there.