JiekaiLab / scTE

MIT License
87 stars 27 forks source link

broken pipe while processing #25

Closed dawe closed 2 years ago

dawe commented 2 years ago

Hi all, I am trying scTE on some scRNA-seq data of mine (hg38). I have BAM files generated with STARSolo and following the instructions I'm quantifying TE like this:

scTE -i ${SAMPLE}Aligned.sortedByCoord.out.bam -o ${SAMPLE}_TE -x hg38.exclusive.idx --hdf5 True -CB CR -UMI UR -p 8

All samples are being processed but in some log files I'm finding this message:

[…]
INFO    : Loading the genome annotation index... 2021-11-18 11:13:28
INFO    : Loaded '/beegfs/scratch/ric.cosr/cittaro.davide/Ref/scTE/hg38/hg38.exclusive.idx' binary file with 4779764 items
INFO    : Finished loading the genome annotation index... 2021-11-18 11:14:06 

INFO    : Processing BAM/SAM files ...2021-11-18 11:14:06
INFO    : Input SAM/BAM file appears to be valid
sed: couldn't write 50 items to stdout: Broken pipe
sed: couldn't write 53 items to stdout: Broken pipe
sed: couldn't write 58 items to stdout: Broken pipe
awk: cmd. line:1: (FILENAME=- FNR=131567431) fatal: print to "standard output" failed (Broken pipe)

The forked process

samtools view -@ 8 HCT116_FOLFIRI_LTAligned.sortedByCoord.out.bam | awk '{OFS="?"}{for(i=12;i<=NF;i++)if($i~/CR:Z:/)n=i}{for(i=12;i<=NF;i++)if($i~/UR:Z:/)m=i}{print $3,$4,$4+100,$n,$m}' | sed -r 's/CR:Z://g' | sed -r 's/UR:Z://g'| sed -r 's/^chr//g' | awk '!x[$4$5]++' | gzip -c > HCT116_FOLFIRI_LT_TE_scTEtmp/o1/HCT116_FOLFIRI_LT_TE.bed.gz

is still running (apparently) but, compared to other processes launched in the same moment, it seems I'm stuck in generating the content of o1 folder. Any hint?

jphe commented 2 years ago

It seems run out of memerory, scTE takes ~10Gb for each thread, can you try -p with less threads