Open teresa-m opened 2 years ago
First, it is better to find some evolutionarily conserved interactions from the literature. For eg, U1 snRNA and MALAT1 lncRNA interaction is conserved between human and mouse (see original PARIS paper). It is enough to have a handful of such interactions to prove that the models are robust enough to detect cross-species conserved interactions.
Find homologs via: https://rest.ensembl.org/ https://rest.ensembl.org/documentation/info/homology_ensemblgene
only possible for genes not for transcripts. Thus, needs to select most likey transcript. eg. pairwise alignment.
It is possible to extract all transcripts via the api:
https://rest.ensembl.org/documentation/info/overlap_id
just like the provided example with the flag
feature=transcript
and afterwards extract the transcript sequence.
https://rest.ensembl.org/documentation/info/sequence_id
make sure to get the spliced version via:
/sequence/id?type=cdna
Disadvantage: Uses a lot of api calls and might take some time
Idea: find a few examples of homology RRIs in human and mouse training data. Homologe x of the mouse can be then tested if it will be correctly predicted by the 'human model' and vice versa.
TODO: