Open cardonaaaa opened 2 years ago
Hey,
Thanks for trying TEPIC. The first thing I noticed is that your GTF files do not seem to contain a "gene" entry, which is the default the script is looking for. If you want to annotate based on transcripts, you have to run the pipeline with the -t parameter.
I also noticed that your "chromsome" identifiers are not numeric. This is something we have never used or tested. Does your narrow peak file contain the same identifiers?
Please let me know if you still run into problems once you use the -t.
Cheers, Florian
Hey @Florian411
I used the -t parameter, it's at the end of the command. Anyway I ran it again and the same error keeps popping. My narrow peak file contains the same identifiers.
So it may be due to the scaffold names not being numeric.
Thanks for answering so quickly!
Could you share with me a preview (maybe first 10 lines) of each of your files via gdrive? I can try to debug a bit further then.
Sure, the full GTF files are in the first comment. I leave the first 10 lines of each file below. Tell me if you have any problems downloading them.
Edit: One thing to take into account is that one of the genomes I'm using is mixed, thus having both chromosomes and scaffolds that have different prefix:
Thanks for the files! I managed to download. Thanks also for the hint regarding the genome file. I'll see what i can find and get back asap.
I just tried it with your example files and it works fine on my end.
bash TEPIC.sh -g ~/dataToTest/example.fasta -b ~/dataToTest/example2.narrowPeak -o TestFiles -p ../PWMs/2.1/JASPAR_PSEMs/JASPAR2018_CORE_vertebrates_non-redundant_pfms_transfac.PSEM -c 4 -a ~/dataToTest/first.gtf -t
is the command i used to obtain the attached annotation. I don't get anything using the second.gtf file. TestFiles_TEPIC_11_12_22_21_16_41_832407535_Decay_Peak_Features_Affinity_Gene_View_Filtered.txt
Have you tried with the example files? As it works with those my guess is that there might be a problematic entry somewhere deeper down in the gtf that causes the annotation script to break.
Well that's weird. I tried with the same files as you and I'm still getting the error. Could it be something related with my python version or sth similar? Anyway it should not be an issue with the GTF, since the one you used is actually the entire file (first.gtf)
Another thing I notice about your output is that gene IDs only contain a slice of the full ID, in this case "Nitab4", which is not enough to identify the locus. Might be because of the "." in the transcript IDs (Nitab4.5_XXXXX) (?)
I'm gonna keep trying to detect what is producing my error, will be back. Thanks for the help again!
I am on python version 3.8.10. Possible that it makes a difference.
Yes, good guess. The annotation only considers the part before the "."
Let me know what you find or if I can help somehow else.
Hi again, I managed to make it work re-downloading the GFF and transforming it to GTF. So it was probably an issue with the GTF being corrupted, as you predicted. Anyway I'm not able to identify my genes because the annotation considers only the part before "." Is it possible to solve that issue so it considers the full annotation? Maybe making it consider just the last "." of the annotation if there's more than one instance (?)
I'll try to remove the "." from the narrowPeak, FASTA and GTF meanwhile, and see how it goes. But this could be a good feature to improve so future users with the same issue do not need to modify their original files.
Hi, I'm trying to run TEPIC v2.2 with the following command:
./TEPIC.sh -g $ref/Ntab-BX_AWOK-SS.fa -b $outdir/14-5_out2_macs2_call_peak_peaks.narrowPeak -o $outdir/14-5_TEPIC_AWOK -p $psem_dir/Nicotiana_IDs_PSCM_JASPAR_2018.PSEM -a $ref/with_desc_Ntab-BX_AWOK-SS_Basma_rnaseq.gtf -c 60 -t
And getting this error:
I tried using two different GTFs (since I'm carrying out my analysis with two different reference genomes) and I'm getting the same error message with both of them. I'd like to know if it's an issue with my GTFs not being "gencode format" or something about the script. None of my files contain the "chr" prefix.
1st GTF: https://drive.google.com/file/d/1hDwjUDY33nUkN_Zpp6YiKaAhmLsjn9p_/view?usp=share_link 2nd GTF (the one used in the shown command): https://drive.google.com/file/d/1pAwSjREFfQ_r7CftXOhrIJcpAZrXh0-J/view?usp=share_link
Python version: 3.7.8 R version: 4.0.4
Thanks in advance!