Closed TamaraO closed 6 years ago
Dear Tamara,
the orfs.cit file contains the ORFs after filtering by FDR, the orfs.tsv file contains all ORFs (before filtering). I corrected the documentation on this. Do you really need a bed file for the unfiltered ORFs?
Sorry, the documentation on that was outdated, I corrected it. It's getType
instead of getOrfType
, and getTotalActivity()
instead of d.getActivities()[0]
.
You can use getTranscriptId
on the .index.cit
file created by IndexGenome
. What is interesting to you is what you can do with your ORFs, isn't it?
To get the ORF id as in the tsv file, you can go for
gedi -e ViewCIT -m bed -q '8+:27491575-27528868' -name 'd.transcript+"_"+d.type+"_"+d.orfid' -score '-Math.log10(d.combinedP)' price/hsv.orfs.cit
This also gives you the (uncorrected) p value (-log10). (rounded, as bed needs an integer field there)
I think I am going to expand the documentation on this in the next few days...
Best regards, Florian
Thank you for the detailed response. What is used for FDR cutoff? I am worried that if it is too stringent, it might remove some ORFs that are borderline "real". I was planning to do the filtering myself.
It is still giving me an error when trying to get ORFid like you recommended:
price$ gedi -e ViewCIT -m bed -q -name 'd.transcript+"_"+d.type+"_"+d.orfid' -score '-Math.log10(d.combinedP)' Sample.orfs.cit
2018-03-21 15:54:59.298 INFO Command: gedi -e ViewCIT -m bed -q -name d.transcript+"_"+d.type+"_"+d.orfid -score -Math.log10(d.combinedP) Sample.orfs.cit
2018-03-21 15:55:01.357 INFO Discovering classes in classpath
2018-03-21 15:55:01.767 INFO Preparing simple class references
2018-03-21 15:55:02.032 INFO Gedi 1.0.2 (JAR) startup
File d.transcript+"_"+d.type+"_"+d.orfid does not exist!
ViewCIT <Options> <file>
Options:
-l Only list the number of elements per reference
-q <location> Only output elements overlapping the given query (i.e. whole reference or genomic region)
-o <file> Specify output file (Default: stdout)
-m <mode> output mode: Bed/Location/Cit/Genepred/Gtf (Default: location)
-t <js/name> name of a transformer or javascript function body transform reference genomic region (variable d is current data, ref the reference, reg the region; either return a data object or a reference genomic region object!)
-name <js> javascript function body to generate name (variable d is current data, ref the reference, reg the region)
-score <js> javascript function body to generate score (variable d is current data, ref the reference, reg the region)
-filter <js>\t javascript function body returning true for entries to use (variable d is current data, ref the reference, reg the region)
-p Show progress
-h Show this message
-D Output debugging information
Still not sure how to extract the ORF id into the BED file...
You forgot to remove the -q
when calling ViewCIT!
FDR cutoff can be specified when calling price with -fdr 0.1
(10% is the default)
Florian
Perfect!! It's working now! Thank you!
Hi Florian,
In the resulting bed file, the ORFs with one exon do not have values in columns 7-12:
1 567020 567161 ENST00000414273.1_1_ncRNA_0 3 +
1 763050 763155 ENST00000608189.4_1_ncRNA_0 1 +
1 2487899 2488094 ENST00000355716.4_1_uORF_3 3 +
1 2488103 2494712 ENST00000355716.4_1_CDS_0 7 + 0 0 0 8 69,109,126,156,91,143,32,126, 0,1061,1678,3158,3959,5008,6200,6483,
1 2494853 2494946 ENST00000463471.6_1_ncRNA_12 1 +
1 3541576 3545167 ENST00000378344.6_1_Variant_0 12 + 0 0 0 5 261,92,177,154,195, 0,411,700,2487,3396,
1 2323330 2328573 ENST00000378513.7_1_uoORF_3 5 + 0 0 0 3 67,88,19, 0,3892,5224,
1 2327229 2334563 ENST00000605895.5_3_CDS_0 2147483647 + 0 0 0 6 81,105,100,79,136,90, 0,1325,3624,5066,6416,7244,
This causes tools like BEDTools to crash. It would be great if the bed file was consistent.
Thank you!
Tamara
I fixed ViewCIT. If you pull the most recent version from github (and use buildworkspace.bash
to build it), BETTools should be able to handle the files!
Best,
Florian
Hi Florian,
Regarding the FDR and filtering for the final .cit file. If I set -fdr 1
flag, I still don't get all of the contents of .tsv in the .cit file, and many ORFs are filtered out. Why is that? How can I get all (or most) of them?
Thank you!
Tamara
Hi Tamara,
I guess that many of the ORFs will receive a corrected p value of 1, and I am filtering using a strict <. You could try specifying -fdr 2
(or anything else greater than 1)!
Best, Florian
Hi,
I am trying to use ViewCIT to extract various features from the .cit file. For example, I would like to get a bed file for all the ORFs. I used this command:
gedi -e ViewCIT -m Bed -name 'd.getType()' Sample.orfs.cit > Sample.bed
It generated a bed file, but there are a lot fewer lines in the bed file than the tsv file:
Why is that? How do I get a bed file for all ORFs?
Then, I try to use another example from your Wiki:
Could you, please, provide a more detailed explanation of how to use ViewCIT? In particular, how to use
-name 'd.getTranscriptId()'
? What else can I write instead of getTranscriptId?In particular, I would like to get a BED file for all ORFs, but in the Name column, I would like to have the ORF Id instead of the ORF type. How can I do that?
Thank you!