AlexsLemonade / alsf-scpca

Management and analysis tools for ALSF Single-cell Pediatric Cancer Atlas data.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Add star indexing workflow #144

Closed jashapiro closed 2 years ago

jashapiro commented 2 years ago

As part of the genetic demulitplexing https://github.com/AlexsLemonade/alsf-scpca/issues/127, we need a STAR genome index. We will use this for bulk mapping to find SNPs.

This PR adds a workflow to create that index, putting the result with the other indices we have created at: s3://nextflow-ccdl-data/reference/homo_sapiens/ensembl-104/star_index/

Currently the index is based on the full gtf file; it might be a bit smaller if we reduced it to the gtf file that only includes the genes in the transcriptome gtf; I may test this.

Otherwise, this should be a pretty simple workflow. My only annoyance is that the input files had to be decompressed on disk :-(.

The other question was the naming of the index output directory. I named it just with the Assembly, but I could add some suffix, like .star_idx if that seemed like something worth doing.

jashapiro commented 2 years ago

Currently the index is based on the full gtf file; it might be a bit smaller if we reduced it to the gtf file that only includes the genes in the transcriptome gtf; I may test this.

Nope, doesn't make a difference. Unless there is another reason, I will stick with the default Ensembl file for simplicity.