RNAcentral / rnacentral-webcode

RNAcentral website source code
https://rnacentral.org
Apache License 2.0
31 stars 8 forks source link

Filtering for some GO terms #563

Closed beginner984 closed 2 years ago

beginner984 commented 2 years ago

Hi

I got a question which I was not able to solve yet

If I am only interested in lncRNAs related to Angiogenesis GO term how to download such a list please?

I searched for Angiogenesis in search bar and some results came out and I see for instance 123 of lncRNAs are here and I want to download them as a list but I don't know how

Untitled

AntonPetrov commented 2 years ago

@beginner984 Thank you for your question! Currently you can download search results as FASTA or JSON files or as a list of RNAcentral IDs:

Screenshot 2021-10-01 at 16 20 01

I would like to note that when you search with a term like Angiogenesis, you will find all entries that mention angiogenesis, not only those annotated with an angiogenesis-related GO term.

You will also notice that most of the results in your search are microRNAs. This is because they are the best-annotated sequences. In fact, there are several groups that manually annotate microRNAs with GO terms using RNAcentral IDs.

I do have a suggestion that you may find helpful. If you search for pub_title:"angiogenesis" you will see additional entries where the term angiogenesis occurs in one of the associated publication titles.

https://rnacentral.org/search?q=pub_title:%22angiogenesis%22%20AND%20so_rna_type_name:%22LncRNA%

Screenshot 2021-10-01 at 16 12 54

The pub_title field is not searched by default because through experience we found that such results are not always relevant, so you might want to inspect the results with additional care.

I hope this helps!

beginner984 commented 2 years ago

Thank you so much for your help

I am not much familiar with JASON or FASTA

In a forum somebody helped to download the results in such a format as I attached by link

http://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/id_mapping/database_mappings/gencode.tsv

Is the anyway to download the results in this format please?

AntonPetrov commented 2 years ago

I am afraid the tsv format is not supported in the text search export because RNAcentral entries have a lot of data and the tsv format is not flexible enough to represent it.

What information are you trying to extract? Ensembl IDs for any of the matching sequences?

beginner984 commented 2 years ago

Thank you so much download gives IDs like URS0000000055 while I need something equal to that like ZNF451-AS1

But I don't know how to retrieve that in such identifier instead of RNAcentral identifiers

AntonPetrov commented 2 years ago

Okay, I see. Currently there is no way to download this info from the web interface but it can be done using the id_mapping.tsv.gz file that contains all external IDs linked to each URS accession: http://ftp.ebi.ac.uk/pub/databases/RNAcentral/current_release/id_mapping/

You can search this file with a tool like grep or by writing a script. Alternatively, you can connect to the public Postgres database and write an SQL query but that also requires some programming.

I can do this for you as a one-off task if you send me a URL of a search result that you want exported in this format.

beginner984 commented 2 years ago

Thanks a million

I have attached a list of lncRNA here

Is it possible to get from this list any lncRNA related to angiogenesis and immune response please? d.txt

AntonPetrov commented 2 years ago

I am afraid that I don't understand the contents of the file. I thought you needed a mapping between URS ids and other accessions, but the file contains only ids like:

linc-HMGCS1:antisense
linc-FAM19A1-2:copy2

It does not look like these IDs are from RNAcentral.

beginner984 commented 2 years ago

hank you so much

Might be non sense, do you know a software giving pathways related to a list of lncRNAs ?

I found LncPath R package which uses a networks of lncRNA-mRNA for getting pathways

Have you heard about a software directly giving pathways or GO terms related to a list of lncRNAs please?

In RNAcentral if we search for a lncRNA we may find a related GO term but this demands one by one checking 🙁

Thanks a lot anyway

AntonPetrov commented 2 years ago

In general I would not recommend using RNAcentral for lncRNA GO annotations, the coverage is very far from comprehensive at the moment. It's okay for microRNAs but not lncRNAs.

I have not used it but you could try LncSEA for lncRNA GO enrichment analysis.

Sorry that I cannot help more!

beginner984 commented 2 years ago

Thanks a million

AntonPetrov commented 2 years ago

No worries, let us know if we can help with anything else.