Run SoloTE without gene quantification

mathosi commented 3 months ago

Hello,

is it possible to run SoloTE without gene quantification such that only TE counts are reported in the outputs? Since the gene counts are already available for the dataset I am analyzing, this would save a lot of disk space (and potentially computation time?).

Thanks! Malte

bvaldebenitom commented 2 months ago

Hi @mathosi!

Sorry for the late reply. I think that you can save some disk space in the first stage of the pipeline, though I'm, not sure if it is straightforward to merge the 2 matrices (the original gene one and the TE matrix from SoloTE) afterwards.

What you can do is run the following command by yourself: samtools view -@ CPUS -O BAM -o newBAM -L TE_BEDfile -e '(exists([CB]) && exists([UB]) && [CB]!="-" && [UB]!="-") && (!exists([GN]) || [GN]=="-")' which will filter the input BAM file to the reads only overlapping with TEs.

Hope this helps.

mathosi commented 2 months ago

Hi @bvaldebenitom, many thanks for your suggestion. I will try it out and close this issue for now.

bvaldebenitom / SoloTE

Run SoloTE without gene quantification #42