I am updating the default index's for each of the workflows to be the spliced_txome_k31 index that was generated from the spliced only cDNA fasta as in #68. Currently, in order to switch to use the spliced_intron_txome_k31 you will need to do so at the command line by doing nextflow run alevin-quant/run-alevin.nf --index_name spliced_intron_txome_k31. I have also updated the default t2gene.tsv files to use the Homo_sapiens.GRCh38.103.spliced.tx2gene.tsv file.
Additionally, while running the workflows with the snRNA-seq data, I noticed that they required more memory than previously. The pre-mRNA index generated by Kallisto, in particular, is 45 gb and requires ~ 100 gb of memory for the samples that I did test runs on. Otherwise it is unable to load the index and write the output files. I am using 120 gb of memory here to be safe.
It appears that Alevin also requires slightly more memory than the 28 gb allotted by the cpus_8 label in the configuration file as well, but nowhere near the 100 gb that Kallisto requires.
I did spend some time trying to assign individual labels based on the input variable so that we could only use the higher memory requirements for Kallisto with snRNA-seq samples only, but it appears that there is no way to use input variables as label assignments... https://github.com/nextflow-io/nextflow/issues/894. There is one potential work around that I was able to find to maybe include a second configuration file based on an input parameter, but after some back and forth with that, I landed on leaving it as is for now.
I am leaving this in draft stage right now, as I still need to change the references in alevin-fry and ensure that workflow runs as expected.
I am updating the default index's for each of the workflows to be the
spliced_txome_k31
index that was generated from the spliced only cDNA fasta as in #68. Currently, in order to switch to use thespliced_intron_txome_k31
you will need to do so at the command line by doingnextflow run alevin-quant/run-alevin.nf --index_name spliced_intron_txome_k31
. I have also updated the defaultt2gene.tsv
files to use theHomo_sapiens.GRCh38.103.spliced.tx2gene.tsv
file.Additionally, while running the workflows with the snRNA-seq data, I noticed that they required more memory than previously. The pre-mRNA index generated by Kallisto, in particular, is 45 gb and requires ~ 100 gb of memory for the samples that I did test runs on. Otherwise it is unable to load the index and write the output files. I am using 120 gb of memory here to be safe.
It appears that Alevin also requires slightly more memory than the 28 gb allotted by the cpus_8 label in the configuration file as well, but nowhere near the 100 gb that Kallisto requires.
I did spend some time trying to assign individual labels based on the input variable so that we could only use the higher memory requirements for Kallisto with snRNA-seq samples only, but it appears that there is no way to use input variables as label assignments... https://github.com/nextflow-io/nextflow/issues/894. There is one potential work around that I was able to find to maybe include a second configuration file based on an input parameter, but after some back and forth with that, I landed on leaving it as is for now.
I am leaving this in draft stage right now, as I still need to change the references in alevin-fry and ensure that workflow runs as expected.