Closed cgroza closed 10 months ago
Hi,
The FIND_SV_ON_REF
step (annotation of SV on the reference) does not work because in the previous step TE_TOWARD_GENOME
(integration of TE on the assembly) there is a failure of integration on the assembled genome, no TE could be integrated.
In log file Snakefile_outsider.log
•••
[96m [SNK]--[Fri Mar 24 02:22:10 EDT 2023] PUT TE OUTSIDER ON GENOME... [0m
•••
[Fri Mar 24 02:22:10 EDT 2023] LOG TASK AKA-018_out/log/TE_TOWARD_GENOME.out, AKA-018_out/log/TE_TOWARD_GENOME.err
INTEGRATE TE DB_ID TO GENOME...
INTEGRATE TE IN READS TO GENOME...
CHECKING TE INTEGRATED...
TOTAL TE: 1933 ; TE INTEGRATE ON GENOME TE_DB_ID : 0 ;
TOTAL TE: 1933 ; TE INTEGRATE ON NEO GENOME: 0 ;
[Fri Mar 24 02:26:14 EDT 2023] LOG TASK AKA-018_out/log/FIND_TE_ON_REF.out, AKA-018_out/log/FIND_TE_ON_REF.err
Could you tell me if the files OUTSIDER/TE_TOWARD_GENOME/TRUE_POSITION_TE.fasta
, OUTSIDER/TE_TOWARD_GENOME/TRUE_POSITION_TE_READS.fasta
contain several sequences or are they empty?
Indeed it is the step of annotation of the TE on the reference, if that does not have you utility you can desactivate this step by changing putting the variable INTEGRATE_TE_TO_GENOME
to False in the config.yaml
file
Sorry, Mourdas
Hi,
I was able to identify the problem I assume that your assembly (GENOME
parameter) fasta is in classic format i.e. :
>header_sequence
80 bp
next 80 bp
next 80 bp
>header_sequence2
....
For integration it was necessary to have this type of format
>header_sequence
ALL bp on one line
>header_sequence2
...
You can either reformat the genome or get the update available which also contains corrections on the TSD part and others.
Mourdas
Hi,
I have managed to complete the TSD step. Now I am on the FIND_SV_ON_REF step. I am seeing this error now:
log.tar.gz
It seems I am really close to completing the pipeline, this is the step where the annotation is moved to the reference genome? My thanks, Cristian