LosicLab / starchip

Detection of Circular RNA and Fusions from RNA-Seq
http://starchimp.readthedocs.io/en/latest/
MIT License
32 stars 11 forks source link

cannot find “Associated Gene Name” from ensembl biomart #9

Closed YiweiNiu closed 6 years ago

YiweiNiu commented 6 years ago

Hello,

I have a simle question. When trying to create "known gene families and known/common false-positive pairs", I cannot find the "Associated Gene Name" under the GENE dropdown. default

I guess it should be "Gene name"?

kippakers commented 6 years ago

Hi YiweiNiu,

Yes, "Gene name" should be fine, it looks like ensembl may have changed their labels a bit. STARChip is going to try to match this gene name to the fusion annotated gene names that it pulls from your GTF file/the fusion location. So if the names from ensembl look similar to the gene_name field in your gtf, then you're good to go!

YiweiNiu commented 6 years ago

Thank you for your quick response!

I've got another question about the paper of STARChip.

In Fig.4C, you compared different tools using healthy samples and thought there shouldn't be lots of fusion transcripts in healthy samples. I generally agree with you, but there is paper Recurrent chimeric fusion RNAs in non-cancer tissues and cells. They used SOAPfuse to profile fusions in healthy human samples and found 9778 fusions, in which 291 were seen in more than one sample. They "also used the same RNA samples that were processed for sequencing MSC muscle differentiation time points, randomly selected 40 candidate fusion transcripts and successfully validated 30 fusions by RT-PCR and traditional Sanger sequencing..."

According to this report, there are some true fusions in healthy samples. My point is: is STARChip too strict for precision? How do you comment?

Bests, Yiwei Niu

kippakers commented 6 years ago

Hi Again Yiwei,

It's really interesting findings in that paper, but I find it hard to draw major conclusions. For one, they don't give important details on their methods. I want to know how deeply they sequenced the samples! Another criticism I have is this line from the abstract, "Over half of the recurrent fusions involve neighboring genes transcribing in the same direction." That sounds like either a circRNA or a read-through transcript to me. Finally, why didn't they validate anything with DNA??? That's the most obvious validation, and it would have eliminated a lot of false-positive sources.

Of course, it's hard to argue with Mass Spec validation. So I'd say they've shown that healthy tissues can have fusions, but they haven't demonstrated much about the frequency that this occurs.

To your final question, my goal with STARChip was to develop a tool that focused on precision. There are a dozen fusion finders out there that sacrifice everything to get the highest sensitivity. For my projects, this was not too helpful. However, STARChip's read requirement settings can be set manually and because it runs so quickly, it's easy to play with the settings to turn up sensitivity and turn down precision and see what you get. Feel free to do so, and let me know what you find!

Cheers, and thanks for using STARChip! Kipp Akers

YiweiNiu commented 6 years ago

Hi Akers,

Thank you for discussing with me! I'm not an expert on this, just happened to see this paper. :)

They should validate the fusions with DNA.

I quite agree with your purpose of designing STARChip. By definition, fusions are generated by genomic rearrangement (or structural variants). Common structural variants maybe hard to find among different individuals. So I'm conservative about fusion transcripts in non-cancer samples. I think there must be some fusions in healthy tissues, but it shoud be rare, at least hard to detect using regular RNA-seq.

Thanks again for your reply and useful tool!

Bests, Yiwei Niu