Closed mathosi closed 2 months ago
Hi @mathosi!
Sorry for the late reply. I think that you can save some disk space in the first stage of the pipeline, though I'm, not sure if it is straightforward to merge the 2 matrices (the original gene one and the TE matrix from SoloTE) afterwards.
What you can do is run the following command by yourself:
samtools view -@ CPUS -O BAM -o newBAM -L TE_BEDfile -e '(exists([CB]) && exists([UB]) && [CB]!="-" && [UB]!="-") && (!exists([GN]) || [GN]=="-")'
which will filter the input BAM file to the reads only overlapping with TEs.
Hope this helps.
Hi @bvaldebenitom, many thanks for your suggestion. I will try it out and close this issue for now.
Hello,
is it possible to run SoloTE without gene quantification such that only TE counts are reported in the outputs? Since the gene counts are already available for the dataset I am analyzing, this would save a lot of disk space (and potentially computation time?).
Thanks! Malte