OliveiraDS-hub / ChimeraTE

A pipeline to detect chimeric transcripts derived from genes and transposable elements.
GNU General Public License v3.0
18 stars 4 forks source link

Error with bedfile #8

Closed Ifengel closed 10 months ago

Ifengel commented 10 months ago

Hello,

I got the following error:

The command was:

bedtools intersect -a /mnt/nvme0n1p1/ifengel/ChimeraTE/projects/chimera_test/tmp/TE_file.bed -wa -nonamecheck -b /mnt/nvme0n1p1/ifengel/ChimeraTE/projects/chimera_test/rep1/alignment/accepted_hits.bed

The error message was: Error: unable to open file or unable to determine types for file /mnt/nvme0n1p1/ifengel/ChimeraTE/ChimeraTE/projects/prueba_quimera/tmp/TE_file.bed

My .bed file has the following format (which is created by the pipeline itself):

8 chr1 8386825 - . -187082416 52 chr1 16776988 + . -178692920 91 chr1 33554408 - . -161917331 0 chr1 50329971 + . -145136573 0 chr1 83885790 - . -111585613 0 chr1 109051332 + . -86419645 15 chr1 125828927 + . -69642495 11 chr1 167772060 - . -27699727 0 chr1 184549326 + . -10922519 49 chr1 3145673 - . -192326175

I don't know how I can to fix this error.

OliveiraDS-hub commented 10 months ago

Dear @Ifengel, the bed file doesn't follow the correct order of columns. Probably the gtf file for TEs that you provided is not well formatted.

The right conversion from .out (output from RepeatMasker) to .gtf can be done with the following command line:

tail -n +4 RMfile.out | egrep -v 'Satellite|Simple_repeat|rRNA|Low_complexity|RNA|ARTEFACT' | awk -v OFS='\t' '{Sense=$9;sub(/C/,"-",Sense);$9=Sense;print $5,"RepeatMasker","similarity",$6,$7,$2,$9,".",$10}' > RMfile.gtf

You can try to use the RMfile.gtf as input to fix the error.

If the error is still happening, could you send me the first lines of your gtf for TEs?

Ifengel commented 10 months ago

Dear Daniel,

Thank you very much for your answer. It works!

My problems were two:

First, I downloaded a .out file from UCSC Genome Browser and this was a bad idea. When I downloaded the file directly from RepeatMasker, it worked.

In addition, I didn't use your command to obtain the gtf file.

Thanks!