galelab / scPathoQuant

single cell pathogen alignment and quantification tool
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

Please remove all pathogen_al_*.csv file outputs from the pipeline as a new release #6

Open JakeLehle opened 5 days ago

JakeLehle commented 5 days ago

Hello,

I really like the pipeline but I have a suggestion for a change to incorporate into a new release. I'm working on metagenomic studies where I have made a reference with a ton of viral sequences (more than 10 million) all in one huge viral_ref.fasta. Most of them won't align to any single cell seq sample I pump through the pipeline but I can run this script if it is going to make 10 million empty .csv files. As far as outputs, the only thing I really care about is the filtered_feature_bc_matrix folder and the BAM and coverage map.pdf. Everything else is bogging down my system and not adding value to the analysis of outputs.

I love the pipeline but please consider removing all of the *.csv outputs. I think they are redundant as this information is avaialble in the BAMS which are only output if there is a hit. P.S. I'm on a grant deadline crunch, so gonna poke around and start cutting things on my branch. If I can get mine to work I'll make a pull request.

lwhitmore commented 4 days ago

Hi, Yes thank you for the suggestion I will definitely incorporate this in the next release

lwhitmore commented 4 days ago

Jake I think if you comment out these lines from these two codes that should temporarily solve the problem patho_genes.py = lines 66 & 73 patho_copies.py = lines 73 & 76 both codes are in the quantify directory

Thanks, Leanne

JakeLehle commented 4 days ago

Hello,

Thank you so much for getting to this so quickly, especially right before the holiday. You rock! I'll test this out over the weekend and confirm.