Can this pipeline distinuish the RNAseq data with the fq.gz formart?

Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes

Other

350 stars 79 forks source link

Can this pipeline distinuish the RNAseq data with the fq.gz formart? #777

Closed changchuanjun closed 6 months ago

changchuanjun commented 6 months ago

Hello! Thanks for your excellent work on BRAKER pipeline accessions, there is no doubt these pipelines facilatates gene structure annotation with the newly assembling genomes. Here, I have a question that Can this pipeline distinuish the RNAseq data with the fq.gz formart ? When I used this pipeline the command as below. 1709693840239 I defined the RNAseq data deposited directionary， in the directionary RNAseq datas were fq.gz rather than fastq formart. However, When I ran the script ,It seemed could not find and discriminate the fq.gz ,So it had to download the RNAseq data online. So I am confused that whether this pipeline just know fastq file formart and can not distinguish fq.gz file Looking forward to your reply sincerely

LarsGab commented 6 months ago

Hi,

BRAKER should find RNA-seq libraries ending on .fq.gz. Which BRAKER version are you using and what are the file names of your RNA-seq libraries names?

Best, Lars

changchuanjun commented 6 months ago

BRAKER3, my RNA-seq libraries ending on .fq.gz

LarsGab commented 6 months ago

Could you clarify which version number of BRAKER you are using? The current version number can typically be found in the braker.log file, e.g. the current version is v3.0.8. Additionally, could you provide the full filenames of your RNA-seq libraries? BRAKER has a specific naming convention for these files. For unpaired reads, the files should be named ID.fq.gz. For paired reads, the filenames should be named like ID_[1,2].fq.gz or ID_[R1,R2].fq.gz.

changchuanjun commented 6 months ago

V3.0.8

my RNA-seq libraries names as below. eg SRR4048288_1_paired_clean.fq.gz，SRR4048288_2_paired_clean.fq.gz

LarsGab commented 6 months ago

Thank you for the information. The issue appears to be the _paired_clean suffix in your file names. If you remove this suffix and rename them to SRAID_1.fq.gz and SRAID_2.fq.gz, it should resolve the problem. Best, Lars

changchuanjun commented 6 months ago

Ok, thanks for your reply.Now I have a new problem. I execute gunzip *.fastq command to convert fq.gz file to fastq file. When i run the script as below. BRAKER reported that absent RNAseq data .I am confused. 1709732740744

in the braker.log file it reported that Couldn't find local RNA-Seq library for SRR21307590 but actually it is present.

LarsGab commented 6 months ago

The likely cause of the issue is that the RNA-seq libraries are stored outside of your /home/ directory, which Singularity cannot access by default. To resolve this, you need to bind the/data/.../RNAseq_fastq directory to your Singularity container. This can be achieved by adding an additional -B option to your singularity exec command. For instance: singularity exec -B /path/to/RNA-seq:/path/to/RNA-seq -B $PWD:$PWD ./braker3.sif ...

changchuanjun commented 6 months ago

Yeah, you are right. Thanks for your reminding. In addition, I want to ask can I execute command: ln -s /data/.../RNAseq_fastq . and then run singularity exec-B $PWD:$PWD ./braker3.sif ... instead of singularity exec -B /path/to/RNA-seq:/path/to/RNA-seq -B $PWD:$PWD ./braker3.sif ...

LarsGab commented 6 months ago

In theory, it should work, but I haven't tested it myself. Using the -B option is likely the safer and more reliable approach.

changchuanjun commented 6 months ago

Ok, I see. Thanks for your guidance and help