akdess / BAFExtract

5 stars 5 forks source link

Create own reference Pileup #2

Closed TobiTekath closed 5 years ago

TobiTekath commented 5 years ago

Hi, I would really like to test your tool with my bulk RNA-seq data. As it seems I did use a different genomic reference when aligning my reads and therefore have partially different chromosome namings and sizes.

I created my own chromosome size list, but I am stuck to create a genome_fasta_pileup which is created from the genomic reference used for mapping (I guess). How can you create a fasta_pileup for another genomic reference than the one you provide?

Please excuse if I did not understand the workflow correctly.

As a side note: I think it would be crucial to mention, that depending on the aligner you used the MAPQ-Values can differ a lot and the setting of [Minimum mapping quality] to 50 (as you suggest) could mean that no reads are surviving the filtering, only due to the aligners implementation of MAPQ.

Thanks in advance!

akdess commented 5 years ago

Dear Tobias, We have updated the documentation and the code.

You can create genome_fasta_pileup_dir files for other genomes using the following command:

BAFExtract -preprocess_FASTA [FASTA file path] [Output directory] for example:

wget -c http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz tar -xvzf chromFa.tar.gz mkdir ../mm10 FILES=./*fa for f in $FILES do echo "Processing $f file..." BAFExtract -preprocess_FASTA $f ../mm10 done

OR if you have one fasta you can simply run BAFExtract -preprocess_FASTA chrom.fa ../output_dir

Thanks a lot for the side note. I added your note and acknowledged you in the documentation.

Let me know if you have any other questions.