Closed vladsavelyev closed 6 years ago
Hi,
I downloaded WGS 10x data from https://support.10xgenomics.com/genome-exome/datasets/2.1.4/NA12878_WGS_v2, subset it with seqtk sample to 10k reads:
seqtk sample
seqtk sample -s100 NA12878_WGS_v2_S1_L001_R1_001.fastq 10000 > subset10k_L001_R1_001.fastq seqtk sample -s100 NA12878_WGS_v2_S1_L001_R2_001.fastq 10000 > subset10k_L001_R2_001.fastq
And taken barcodes from the tenkit package from 10x: https://github.com/10XGenomics/supernova/blob/master/tenkit/lib/python/tenkit/barcodes/4M-with-alts-february-2016.txt
tenkit
Then I tried to run the pipeline. The count command ran fine (though on first attempt it crashed on compressed input before I figured .fastq.gz is not supported):
count
.fastq.gz
cat subset10k_L001_R*_001.fastq | ema count -1 - -w 4M-with-alts-february-2016.txt -o counts_file
However the following preproc command dies with the following error:
preproc
cat subset10k_L001_R*_001.fastq | ema preproc -1 - -w 4M-with-alts-february-2016.txt -c counts_file -n 2 > ema: src/preprocess.c:389: preprocess_fastqs: Assertion `__extension__ ({ size_t __s1_len, __s2_len; (__builtin_constant_p (id1) && __builtin_constant_p (id2) && (__s1_len = strlen (id1), __s2_len = strlen (id2), (!((size_t)(const void *)((id1) + 1) - (size_t)(const void *)(id1) == 1) || __s1_len >= 4) && (!((size_t)(const void *)((id2) + 1) - (size_t)(const void *)(id2) == 1) || __s2_len >= 4)) ? __builtin_strcmp (id1, id2) : (__builtin_constant_p (id1) && ((size_t)(const void *)((id1) + 1) - (size_t)(const void *)(id1) == 1) && (__s1_len = strlen (id1), __s1_len < 4) ? (__builtin_constant_p (id2) && ((size_t)(const void *)((id2) + 1) - (size_t)(const void *)(id2) == 1) ? __builtin_strcmp (id1, id2) : (__extension__ ({ const unsigned char *__s2 = (const unsigned char *) (const char *) (id2); register int __result = (((const unsigned char *) (const char *) (id1))[0] - __s2[0]); if (__s1_len > 0 && __result == 0) { __result = (((const unsigned char *) (const char *) (id1))[1] - __s2[1]); if (__s1_len > 1 && __result == 0) { __result = (((const unsigned char *) (const char *) (id1))[2] - __s2[2]); if (__s1_len > 2 && __result == 0) __result = (((const unsigned char *) (const char *) (id1))[3] - __s2[3]); } } __result; }))) : (__builtin_constant_p (id2) && ((size_t)(const void *)((id2) + 1) - (size_t)(const void *)(id2) == 1) && (__s2_len = strlen (id2), __s2_len < 4) ? (__builtin_constant_p (id1) && ((size_t)(const void *)((id1) + 1) - (size_t)(const void *)(id1) == 1) ? __builtin_strcmp (id1, id2) : (__extension__ ({ const unsigned char *__s1 = (const unsigned char *) (const char *) (id1); register int __result = __s1[0] - ((const unsigned char *) (const char *) (id2))[0]; if (__s2_len > 0 && __result == 0) { __result = (__s1[1] - ((const unsigned char *) (const char *) (id2))[1]); if (__s2_len > 1 && __result == 0) { __result = (__s1[2] - ((const unsigned char *) (const char *) (id2))[2]); if (__s2_len > 2 && __result == 0) __result = (__s1[3] - ((const unsigned char *) (const char *) (id2))[3]); } } __result; }))) : __builtin_strcmp (id1, id2)))); }) == 0' failed. [1] 1448 broken pipe cat subset10k_L001_R*_001.fastq | 1449 abort ema preproc -1 - -w 4M-with-alts-february-2016.txt -c counts_file -n 2
Does it have something to do with subsetting the input?
Attaching a tarball with the inputs.
NA12878_WGS_10x_subset10k.gz
Just as I submitted the issue, figured that for preproc, I have to pass the fastq files separately with -1 and -2 commands :) Works now.
-1
-2
Hi,
I downloaded WGS 10x data from https://support.10xgenomics.com/genome-exome/datasets/2.1.4/NA12878_WGS_v2, subset it with
seqtk sample
to 10k reads:And taken barcodes from the
tenkit
package from 10x: https://github.com/10XGenomics/supernova/blob/master/tenkit/lib/python/tenkit/barcodes/4M-with-alts-february-2016.txtThen I tried to run the pipeline. The
count
command ran fine (though on first attempt it crashed on compressed input before I figured.fastq.gz
is not supported):However the following
preproc
command dies with the following error:Does it have something to do with subsetting the input?
Attaching a tarball with the inputs.
NA12878_WGS_10x_subset10k.gz