Closed sangho1130 closed 3 months ago
Hi @sangho1130! Thank you for using FREDDIE.
- Our data are nanopore long reads and previously we used minimap2 for the alignment. Here, when I try to feed freddie star, which uses starlong to align long reads, I get this following error message in a log file.
<_EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length 43cf7361-87a6-4fac-ab84-0c16836216fb_REV_PS=857_PE=895_AE=916_T=45_X=AAAAAAAAAAAAATTAAGAACTCTCGAGAGCCTAGAAATCAGA_Q=22.91hur0f (...sequence excluded) SOLUTION: fix your fastq file>
I'm wondering if freddie is compatible with nanopore data or should I pass the alignment step and use pre-aligned bam files in this case.
Regarding to this issue in the STAR step: This error usually occurs when the number of nucleotides in the read does not match the quality score length (which can happen for various reasons, such as adapter trimming). However, since you already have the data aligned by minimap2, I don't think it's a problem to skip this step. FREDDIE gives you this option with the -f flag. You could use:
## Create a file with the bam paths to the docker folder
$ ls /path/bam-files/*bam
/path/bam-files/sample1_sorted.bam
/path/bam-files/sample2_sorted.bam
$ ls /path/bam-files/*bam | awk -F "/" '{print "/home/bams/"$NF}' > $PWD/files.txt
## Example of file
$ cat $PWD/files.tsv
/home/bams/sample1_sorted.bam
/home/bams/sample2_sorted.bam
$ time docker run --rm -u $(id -u):$(id -g) -w $(pwd) -v $PWD:/home/freddie -v /path/bam-files/:/home/bams/\
freddie string -o /home/freddie/K562 \
-a /home/freddie/db/gencode.v36.annotation.gtf \
-f /home/freddie/files.txt
Important: From the moment you use this step with the bams from minimap2, just make sure that your reference files (reference genome and gtf file) have the same chromosome naming convention (for example, both should be either with "chr" or without).
- I was playing around with some of my short read data with freddie, but I am stuck at the "chimeric" step. I think a bed4 file which is required is missing and not generated(?) in the previous step. All I got from "string" were <sample~.freddie.gtf> and
and log files. Are there additional commands after running string?
Regarding this point, it depends on what you want to identify. The BED4 file must be provided by the user with the events they want to find in the BED4 format. In the Praticcal workflow, we use retrocopies from our RCPedia database. However, if you are interested in another retroelement, you can use the BED file from RepeatMasker.
I have included a link to download the zipped BED file with all the elements; it is important that you filter those of interest and format them as required by the tool. For the file that you can download from that link, this would be the process to handle it:
$ gunzip rmsk.bed.gz
$ head rmsk.bed
chr1 67108753 67109046 L1P5 1892 +
chr1 8388315 8388618 AluY 2582 -
chr1 25165803 25166380 L1MB5 4085 +
chr1 33554185 33554483 AluSc 2285 -
chr1 41942894 41943205 AluY 2451 -
chr1 50331336 50332274 HAL1 1587 +
chr1 58719764 58720546 L2a 1393 +
chr1 75496057 75497775 L1MA9 5372 +
chr1 92274205 92275925 L2 536 +
chr1 100662981 100669120 L1PA4 25118 -
$ fgrep Alu rmsk.bed | cut -f 1-4 | sort -k1,1 -k2,2n | head
chr1 26790 27053 AluSp
chr1 31435 31733 AluJo
chr1 33465 33509 Alu
chr1 35366 35499 AluJr
chr1 39623 39924 AluSx
chr1 40628 40729 AluSz6
chr1 51584 51880 AluYj4
chr1 61862 62160 AluSc
chr1 76892 77201 AluSz
chr1 78285 78421 AluJr
$ fgrep Alu rmsk.bed | cut -f 1-4 | sort -k1,1 -k2,2n > rmsk.bed4
Thank you once again for getting in touch.
@rmercuri Thank you so much!!
@rmercuri Oh, one last thing if I may, I think the "databases" is offline (https://github.com/galantelab/freddie?tab=readme-ov-file#databases). Are there other repositories that I can access those files?
Thanks!
@sangho1130 We update the files and the links were changed. But now it's working! I'm sorry for that.
@rmercuri Oh, one last thing if I may, I think the "databases" is offline (https://github.com/galantelab/freddie?tab=readme-ov-file#databases). Are there other repositories that I can access those files?
Thanks!
Thank you so much @rmercuri !
No problem! I you need any help with the pipeline, feel free to contact me via email or here. Additionally, if possible, please provide a review of your experience using the tool with your data (Attended your expectations? haha).
Hi, Thanks for developing freddie tool! I was trying to apply freddie to our cancer dataset, however, currently I'm experiencing some issues.
1) Our data are nanopore long reads and previously we used minimap2 for the alignment. Here, when I try to feed freddie star, which uses starlong to align long reads, I get this following error message in a log file.
<_EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length @43cf7361-87a6-4fac-ab84-0c16836216fb_REV_PS=857_PE=895_AE=916_T=45_X=AAAAAAAAAAAAATTAAGAACTCTCGAGAGCCTAGAAATCAGA_Q=22.91hur0f (...sequence excluded) SOLUTION: fix your fastq file>
I'm wondering if freddie is compatible with nanopore data or should I pass the alignment step and use pre-aligned bam files in this case.
2) I was playing around with some of my short read data with freddie, but I am stuck at the "chimeric" step. I think a bed4 file which is required is missing and not generated(?) in the previous step. All I got from "string" were <sample~.freddie.gtf> and and log files. Are there additional commands after running string?
Thanks for your help!