Open snk5040 opened 2 years ago
You are better off doing this sort of thing using NCBI Datasets. That said, you can do this using EntrezDirect as follows:
$ elink -db gene -id 102888688 -target nuccore -name gene_nuccore_refseqrna | efetch -format fasta | head -n2
>NM_001290175.1 Pteropus alecto interferon induced with helicase C domain 1 (IFIH1), mRNA
AGAGCTGCGTCGCGAGAGAGCAGAGGCGGCTCCCTAGTCCCGGCCCCCGCGAGCACCGTAGAGTCAGAGG
$ elink -db gene -id 102888688 -target protein -name gene_protein_refseq | efetch -format fasta | head -n2
>NP_001277104.1 interferon-induced helicase C domain-containing protein 1 [Pteropus alecto]
MSNEYSADKRFRYLISCFRARVKMYIQVEPVLDYLTFLSADMKEQIQRTATTMGNINAAEQLLSTLEKGV
Your command to get mRNA is correct. What makes you say that the output sequence has introns?
Great, thanks
Hi everyone,
I would like to use a list of gene ids to get FASTA formats of the proteins coded in those genes and the mRNA sequence without introns.
So far with this command I can get the protein sequence: os.system('esearch -db gene -query "'+ "102888688" + ' [ID]" | elink -target protein -name gene_protein_refseq -cmd neighbor | xtract -pattern LinkSet -block IdList -element Id -block LinkSetDb -element Id | efetch -db protein -format fasta')
With this command I can get the mRNA with introns, which I don't want: os.system('elink -db gene -id ' + "102888688" + ' -target nuccore -name gene_nuccore_refseqrna | efetch -format fasta')