CCBR / ASPEN

CCBR pipeline for preliminary QC and peak calling from ATACseq datasets 🌲
https://ccbr.github.io/ASPEN/
MIT License
0 stars 2 forks source link

sample error with samples > 39 #8

Closed slsevilla closed 1 year ago

slsevilla commented 1 year ago

I kicked off a run with yesterday without a problem (N=17). Tried to do a second run with more samples (N=40) and keep getting an error during init.smk.

I've been trying t troubleshoot it and the only thing that I can figure out is if I run it with 39 samples it runs fine, but as soon as I had the remaining sample, it errors. Doesn't matter which sample is at the end, always errors.

Specifically, the error is happening during init.smk, when its going through the reps to make sure that R1/R2 exist. I edited it to print out the rep, r1, r2 values

this works as normal for sample N=38:

rep:  NHP27_TP4_CD14negCD16pos
R1:  /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s788_r1.fq.gz
R2:  /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s788_r2.fq.gz

but then, after when it gets to sample N=39 it does this:

rep:  NHP21_TP1_CD14negCD16pos
R1:  replicateName
NHP21_TP1_CD14negCD16pos    /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s...
NHP21_TP1_CD14negCD16pos    /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s...
NHP21_TP1_CD14negCD16pos    /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s...
Name: path_to_R1_fastq, dtype: object
R2:  replicateName
NHP21_TP1_CD14negCD16pos    /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s...
NHP21_TP1_CD14negCD16pos    /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s...
NHP21_TP1_CD14negCD16pos    /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s...
Name: path_to_R2_fastq, dtype: object

It seems to be repeating the entire df again. If I run with N=39 it prints as expected.

rep:  NHP27_TP4_CD14negCD16pos
R1:  /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s788_r1.fq.gz
R2:  /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s788_r2.fq.gz
rep:  NHP21_TP1_CD14negCD16pos
R1:  /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s791_r1.fq.gz
R2:  /data/NCI_VB/rawdata/CCRVB-13/190321_0539/fq/s791_r2.fq.gz

I've also printed out the replicates, and those print as expected for both N=39

['NHP17_TP1_CD14posDRneg', 'NHP17_TP1_CD14posDRpos', 'NHP17_TP2_CD14posDRneg', 'NHP17_TP2_CD14posDRpos', 'NHP17_TP3_CD14posDRneg', 'NHP17_TP3_CD14posDRpos', 'NHP17_TP4_CD14posDRneg', 'NHP17_TP4_CD14posDRpos', 'NHP10_TP2_CD14posDRneg', 'NHP10_TP2_CD14posDRpos', 'NHP10_TP3_CD14posDRneg', 'NHP10_TP3_CD14posDRpos', 'NHP10_TP4_CD14posDRneg', 'NHP10_TP4_CD14posDRpos', 'NHP27_TP1_CD14posDRneg', 'NHP27_TP1_CD14posDRpos', 'NHP27_TP2_CD14posDRneg', 'NHP27_TP2_CD14posDRpos', 'NHP27_TP3_CD14posDRneg', 'NHP27_TP3_CD14posDRpos', 'NHP27_TP4_CD14posDRneg', 'NHP27_TP4_CD14posDRpos', 'NHP21_TP1_CD14posDRneg', 'NHP21_TP1_CD14posDRpos', 'NHP22_TP1_CD14posDRneg', 'NHP22_TP1_CD14posDRpos', 'NHP17_TP1_CD14negCD16pos', 'NHP17_TP2_CD14negCD16pos', 'NHP17_TP3_CD14negCD16pos', 'NHP17_TP4_CD14negCD16pos', 'NHP10_TP1_CD14negCD16pos', 'NHP10_TP2_CD14negCD16pos', 'NHP10_TP3_CD14negCD16pos', 'NHP10_TP4_CD14negCD16pos', 'NHP27_TP1_CD14negCD16pos', 'NHP27_TP2_CD14negCD16pos', 'NHP27_TP3_CD14negCD16pos', 'NHP27_TP4_CD14negCD16pos', 'NHP21_TP1_CD14negCD16pos']

N=40

['NHP17_TP1_CD14posDRneg', 'NHP17_TP1_CD14posDRpos', 'NHP17_TP2_CD14posDRneg', 'NHP17_TP2_CD14posDRpos', 'NHP17_TP3_CD14posDRneg', 'NHP17_TP3_CD14posDRpos', 'NHP17_TP4_CD14posDRneg', 'NHP17_TP4_CD14posDRpos', 'NHP10_TP2_CD14posDRneg', 'NHP10_TP2_CD14posDRpos', 'NHP10_TP3_CD14posDRneg', 'NHP10_TP3_CD14posDRpos', 'NHP10_TP4_CD14posDRneg', 'NHP10_TP4_CD14posDRpos', 'NHP27_TP1_CD14posDRneg', 'NHP27_TP1_CD14posDRpos', 'NHP27_TP2_CD14posDRneg', 'NHP27_TP2_CD14posDRpos', 'NHP27_TP3_CD14posDRneg', 'NHP27_TP3_CD14posDRpos', 'NHP27_TP4_CD14posDRneg', 'NHP27_TP4_CD14posDRpos', 'NHP21_TP1_CD14posDRneg', 'NHP21_TP1_CD14posDRpos', 'NHP22_TP1_CD14posDRneg', 'NHP22_TP1_CD14posDRpos', 'NHP17_TP1_CD14negCD16pos', 'NHP17_TP2_CD14negCD16pos', 'NHP17_TP3_CD14negCD16pos', 'NHP17_TP4_CD14negCD16pos', 'NHP10_TP1_CD14negCD16pos', 'NHP10_TP2_CD14negCD16pos', 'NHP10_TP3_CD14negCD16pos', 'NHP10_TP4_CD14negCD16pos', 'NHP27_TP1_CD14negCD16pos', 'NHP27_TP2_CD14negCD16pos', 'NHP27_TP3_CD14negCD16pos', 'NHP27_TP4_CD14negCD16pos', 'NHP21_TP1_CD14negCD16pos', 'NHP21_TP1_CD14negCD16pos']

Finally, I have tried removing extra lines and carriages and that hasn't fixed it either. I don't get where the issue is coming from.

Attaching samples file as example. samples.txt

dryrun error

TypeError in line 39 of /gpfs/gsfs10/users/CCBR_Pipeliner/Pipelines/ASAP/v0.5.3/workflow/rules/init.smk:
stat: path should be string, bytes, os.PathLike or integer, not Series

Using ASAP found here

/data/CCBR_Pipeliner/Pipelines/ASAP/v0.5.3

Need an extra pair of eyes, @kopardev!

kopardev commented 1 year ago
% cut -f1 samples.txt|sort|uniq -c|sort -k1,1nr       
   2
   2 NHP21_TP1_CD14negCD16pos
   1 NHP10_TP1_CD14negCD16pos
   1 NHP10_TP2_CD14negCD16pos
   1 NHP10_TP2_CD14posDRneg
   1 NHP10_TP2_CD14posDRpos
   1 NHP10_TP3_CD14negCD16pos
   1 NHP10_TP3_CD14posDRneg
   1 NHP10_TP3_CD14posDRpos
   1 NHP10_TP4_CD14negCD16pos
   1 NHP10_TP4_CD14posDRneg
   1 NHP10_TP4_CD14posDRpos
   1 NHP17_TP1_CD14negCD16pos
   1 NHP17_TP1_CD14posDRneg
   1 NHP17_TP1_CD14posDRpos
   1 NHP17_TP2_CD14negCD16pos
   1 NHP17_TP2_CD14posDRneg
   1 NHP17_TP2_CD14posDRpos
   1 NHP17_TP3_CD14negCD16pos
   1 NHP17_TP3_CD14posDRneg
   1 NHP17_TP3_CD14posDRpos
   1 NHP17_TP4_CD14negCD16pos
   1 NHP17_TP4_CD14posDRneg
   1 NHP17_TP4_CD14posDRpos
   1 NHP21_TP1_CD14posDRneg
   1 NHP21_TP1_CD14posDRpos
   1 NHP22_TP1_CD14posDRneg
   1 NHP22_TP1_CD14posDRpos
   1 NHP27_TP1_CD14negCD16pos
   1 NHP27_TP1_CD14posDRneg
   1 NHP27_TP1_CD14posDRpos
   1 NHP27_TP2_CD14negCD16pos
   1 NHP27_TP2_CD14posDRneg
   1 NHP27_TP2_CD14posDRpos
   1 NHP27_TP3_CD14negCD16pos
   1 NHP27_TP3_CD14posDRneg
   1 NHP27_TP3_CD14posDRpos
   1 NHP27_TP4_CD14negCD16pos
   1 NHP27_TP4_CD14posDRneg
   1 NHP27_TP4_CD14posDRpos
   1 replicateName

Observations:

kopardev commented 1 year ago

BTW, NHP21_TP1_CD14negCD16pos is also the 39th replicate in your example. Its nothing to do with it being 39th , but the fact that replicateName is repeated with the 39th entry.

slsevilla commented 1 year ago

Ah! I looked at uniq too and didn't catch this.

I'll follow up with Kate and have her fix the manifest. Will update issue if this resolves it.

On Tue, Dec 13, 2022, 21:34 Vishal Koparde @.***> wrote:

BTW, NHP21_TP1_CD14negCD16pos is also the 39th replicate in your example. Its nothing to do with it being 39th , but the fact that replicateName is repeated with the 39th entry.

— Reply to this email directly, view it on GitHub https://github.com/CCBR/ASAP/issues/8#issuecomment-1350280518, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6EEILM3A5ENXAULB5PC6LWNEW3ZANCNFSM6AAAAAAS5WQFWM . You are receiving this because you were assigned.Message ID: @.***>

slsevilla commented 1 year ago

Issue resolved with fixing of replicate name. Adding an issue to check for replicate names to ensure this doesnt happen moving forward