Closed ecroot closed 11 months ago
Hi, The error was caused by not supporting the t2t reference genome in the current KAS-Analyzer (blacklist file is not available). You can consider creating a blank blacklist file and put it in the blacklist repository. Best, Ruitu
On Thu, Nov 2, 2023 at 11:24 AM E Croot @.***> wrote:
Is your feature request related to a problem? Please describe. I am trying to run KAS-Analyser with bwa mapping to the t2t (hs1) human genome assembly. The command I am using is: KAS-Analyzer KAS-seq -a bwa -t 10 -i ~/Data/ref_genomes/t2t/hs1.fa -e 150 -o NO2test -s hs1 -1 NO2_S11_R1_001_trimmed.fq.gz
After running successfully through the following stages:
- bwa mapping to my indexed t2t reference genome
- samtools sort
- samtools rmdup
The command gets stuck on the stage Transfer test_rmdup.bam into test.bed with bamToBed. with the error Error: Unable to open file /path/Programs/KAS-Analyzer/scripts/../blacklist/hs1-blacklist.bed. Exiting.
I have checked the blacklist directory, and there does not appear to be a t2t-related blacklist file available.
It is frustrating to get this far before the error occurs.
Describe the solution you'd like
t2t compatibility, by either:
- incorporation of a relevant blacklist or other exclusion list (I'm not sure that there is an ENCODE blacklist for t2t yet, but there are other, similar efforts such as https://academic.oup.com/bioinformatics/article/39/4/btad198/7126418), and relevant methods to handle it
- skipping the blacklist step for t2t - this may be appropriate given that it is more complete than other builds
earlier checks for a valid blacklist argument in the command, so that if an invalid genome version is requested, then the command fails immediately, with a relevant warning message
documentation to clarify that t2t alignment is not (yet) supported
— Reply to this email directly, view it on GitHub https://github.com/Ruitulyu/KAS-Analyzer/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/APDPVGGJ3WU5ELQE73I3F23YCPCKDAVCNFSM6AAAAAA63FGR5SVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3TINRRGI4TINQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Nutricula, based in Chicago.
Hi Ruitu, thank you for your suggestion. Adding a blank hs1-blacklist.bed file to the KAS-Analyzer/blacklist directory worked to get around the blacklist error.
However, I then immediately ran into a related error at the next stage:
Error: Unable to open file path_to/KAS-Analyzer/scripts/../chrom_size/hs1.chrom.sizes.bed. Exiting.
For anyone else experiencing this error, here's how I resolved it:
The files I created are fairly basic compared to the files for other genome builds (the files for the other genome builds contain information for various release updates, whereas for t2t the file I downloaded only has information for version). I have attached the files I used, in case they are of use to other KAS-Analyzer users who have experienced similar issues. hs1.chrom.sizes.zip
Hi Ruitu,
I have another question/request regarding t2t compatibility.
The peakscalling command is also not compatible with t2t. The peak callers macs2 and epic2 seem to only require a genome to be specified for size estimation. Therefore, when using the t2t reference (i.e. the .bed input files have been generated for t2t), is it acceptable to use hg38 as an input for peakscalling, because it is close in size to t2t? Or will there be negative consequences for providing an inaccurate genome build version here?
Do you have any tips for how best to handle peakscalling for t2t?
Thank you for your help so far, Emmon
Hi, I don't think if you use hg38 as an input for peakscalling will have some bad consequences for the accuracy. Actually, the genome assembly you select just guide KAS-Analyzer to provide MACS2 or epic2 the relative effective genome size, which is just a rough calculation based on my understanding. Best, Ruitu
Thanks for your help. These kas-seq and peakscalling issues are resolved now, so I will close this thread.
Is your feature request related to a problem? Please describe. I am trying to run KAS-Analyser with bwa mapping to the t2t (hs1) human genome assembly. The command I am using is:
KAS-Analyzer KAS-seq -a bwa -t 10 -i ~/Data/ref_genomes/t2t/hs1.fa -e 150 -o NO2test -s hs1 -1 NO2_S11_R1_001_trimmed.fq.gz
After running successfully through the following stages:
The command gets stuck on the stage
Transfer test_rmdup.bam into test.bed with bamToBed.
with the errorError: Unable to open file /path/Programs/KAS-Analyzer/scripts/../blacklist/hs1-blacklist.bed. Exiting.
I have checked the blacklist directory, and there does not appear to be a t2t-related blacklist file available.
It is frustrating to get this far before the error occurs.
Describe the solution you'd like