Open ofonov opened 7 years ago
Hi there,
I have the same question. I could not understand if the tool can be used just giving it in input the fasta transcriptome and the ORFs or it needs something other. Can you please clarify?
Thanks!
In additon to the .fasta file, you should also provide the .gtf file. The .gtf file would be used to extract the exon informations for each transcripts in the .fasta file. So, the .gtf file should contain all the exon informations of transcripts in your .fasta file. And you can reference the format of .gtf file as it was used in the GENCODE dataset. The ORFs would be calcualted in the lncScore program, so the ORFs is not necessary. And the hexamer matrix and training dataset is built based on the sequences, so they have nothing to do with the the genome versions.
Ok. Thanks. Could you please suggest a program to convert a fasta into GTF format?
About the training dataset, the "dat" folder contains only the training set and hexamer files for human and mouse but for other species only the XX_hexamer.tsv is contained. How can I produce the training set?
I thought that only a fasta could not be converted into GTF format, as the fasta file does not contain any exon informations. And I don't know any program that can convert a fasta into GTF format. If you want to proudce your own training model, you can use 'make_TrainingDat.py' in the 'tools' folder. And you can also use 'make_hexamer_tab.py' in the 'tools' folder to produce your own xx_hexamer.tsv.
How will I produce the GTF file if I only have a newly assembled denovo transcriptome? Do I need that a reference genome is present to localize exactly the exons for lncScore?
Sorry, I have no idea about producing the GTF file with a newly de denovo assembled transcritome, as I did not do any de novo assembly for a non-model organism. When you used lncScore, you should provide the exon information for it. And if you have the exact exon information, you can produce a gtf file using the format in the gtf file of the GENCODE dataset by yourself.
Ok thanks a lot for your quick responses
Hi,
Could you please clarify documentation of the lncScore. I am new in the lncRNA prediction field and some things were not clear for me. In particular:
-f input files, --file=input files, enter transcripts in .bed or .fasta
Does that mean that one have to use as an input here a bed or a fasta of assembled transcriptome? Or is it something else?-g gtf file name, --gtf=gtf file name please enter your gtf files
Do you mean that hear should be annotation files, e.g Homo_sapiens.GRCh38.84.gtf?-x hexamer matrix, --hex=hexamer matrix
Does it have to be build for different genome versions e.g GRCh38 or GRCh37, or is it universal?-t training dataset, --train=training dataset
The same question as above, does it have to be build for different genome versions e.g GRCh38 or GRCh37, or is it universal?Thank you in advance for clarification. It would be really useful for me, and perhaps for other users of your software. It In advance, I have to state that I did read the paper, however I did not find answers to these questions. Perhaps I missed something out.