epi2me-labs / pychopper

cDNA read preprocessing
Other
61 stars 9 forks source link

Pychopper stalling hanging #46

Closed MustafaElshani closed 2 weeks ago

MustafaElshani commented 11 months ago

The issue might be related to https://github.com/epi2me-labs/pychopper/issues/11 and https://github.com/epi2me-labs/pychopper/issues/13.

I am running this is SLURM managed HPC previously I did not have this issue when running local. The files are loaded to fast SSD, given 64 CPU and 70 GB RAM

I have tried different batches 1000, 10000(default), 100000, 1000000(although this was not respected and random batch was done 300000+)

Issues noticed

  1. Not all CPU are utilised ever
  2. CPU usage increase to ~13% when batch size processed than drops when stalled
  3. RAM usage increase with batch size
  4. it/s increases with batch size
  5. Stalling happens when the Batch size is completed
  6. Stalling waiting time does not correlate with batch size.

I don't know if this is related HPC alone, but there seems to be a disconnect somewhere when system allocation not fully utilised

nrhorner commented 10 months ago

Hi @MustafaElshani

I have not used Pychopper on HPC, but I will try to see if I can replicate this. In the meantime, please share any logs if you're able to

MustafaElshani commented 10 months ago

Thank you for looking into this @nrhorner Further to the comment above I found batch size 50000 to be best fit for this HPC (node with 64Threads) (storing file in SSD) however its still 4-5x slower than local workstation with 40Threads.

If you can let me know how generate logs i'm happy to share

nrhorner commented 10 months ago

Hi @MustafaElshani

Just the logging stdout please

MustafaElshani commented 10 months ago

The stdout for one of the runs

Starting job on n19-32-192-hulk for sample NuTide701_001-132_SCR_LR

CPUs available: 64 
Activating Conda environment...
Processing sample SCR_LR
Concatenating FASTQ files for sample SCR_LR...
Starting Pychopper for sample SCR_LR...
Using kit: ***/envs/pychopper/lib/python3.8/site-packages/pychopper/primer_data/PCS111_primers.fas
Configurations to consider: "+:SSP,-VNP|-:VNP,-SSP"
Counting fastq records in input file: ./fastq/SCR_LR.fastq
Total fastq records in input file: 38366869
Tuning the cutoff parameter (q) on 9822 sampled reads (0.0%) passing quality filters (Q >= 7.0).
Optimizing over 30 cutoff values.
100%|██████████| 30/30 [06:26<00:00, 12.82s/it]
Best cutoff (q) value is 0.1724 with 95% of the reads classified.
Processing the whole dataset using a batch size of 50000:
  4%|▍         | 1550000/38366869 [21:30<1:14:21, 8252.72it/s]

There are pauses every 50000 batches it keeps up that dosen't happen with local PC

MustafaElshani commented 2 weeks ago

More of an issue with HPC rather pychopper