RobertsLab / resources

https://robertslab.github.io/resources/
18 stars 11 forks source link

delimited GO Slim file? #1036

Closed emmats closed 3 years ago

emmats commented 3 years ago

Does anyone have a delimited GO Slim file for finding corresponding Slim terms if I have a list of GO terms? I found this website (http://geneontology.org/docs/download-ontology/#subsets) but none of the files are in an easily usable format.

Thanks!

sr320 commented 3 years ago

We have ... http://owl.fish.washington.edu/halfshell/bu-alanine-wd/17-07-20/GO-GOslim.sorted

emmats commented 3 years ago

Thanks. Do you know how recent that file is?

sr320 commented 3 years ago

made in 2017...

@shellytrigg might have something useful that is more recent?

kubu4 commented 3 years ago

@shellytrigg and I have used GSEAbase in R to do this most recently.

Here's an example (R Project - download the entire repo to easily reproduce on your own computer):

https://github.com/RobertsLab/code/tree/master/r_projects/sam/20200207_cbai_DEG_GO-GOslims/scripts

emmats commented 3 years ago

@kubu4 I'm not sure what this line in your code means: then i moved it to the R library for GSEABase in the extdata folder

Nothing like this exists on my computer. Is this something that is in the repo?

kubu4 commented 3 years ago

I'm not entirely sure where R libraries are stored on Macs (I'm assuming that's what you're using). On my computer (Linux), here's the location where I put the goslim_generic.obo:

/home/sam/R/x86_64-pc-linux-gnu-library/3.4/GSEABase/extdata

emmats commented 3 years ago

I don't know why I couldn't find the folder when I was searching for it, but in case anyone wants to know.... In R console type ".Library" This will give you the path to your R libraries

emmats commented 3 years ago

I've been struggling with versions and package issues, but I finally got a list of GO slim terms they just didn't correspond to any of my GO terms (count =0 throughout). What I did was a lot simpler than your code: I gave R a list of GO terms (GO:....) called goterms and did the following

goterms.cha<-as.character(goterms) myCollection<-GOCollection(goterms.cha) slimsdf<-goSlim(myCollection, slim, 'BP')

@kubu4 if you have any feedback, I'd appreciate it. I'm happy to provide input files, etc. if that helps with troubleshooting. If you have time for this, of course.

kubu4 commented 3 years ago

Sure, share an R Project - I'd be interested in playing around with stuff over the weekend.

I would expect your code to be a bit simpler, as mine was set up to download files, reformat those files, parse out GO terms, fees them to GSEAbase, loop through all three GOslim categories, and format the output filenames - all much more than you need/want ATM.

emmats commented 3 years ago

Below is my code and attached is my input file.

I can't attach a .csv so have saved as a .txt even though code does not match that file format. GO numbers.txt

slim<-getOBOCollection('/Library/Frameworks/R.framework/Versions/4.0/Resources/library/GSEABase/extdata/goslim_generic.obo')

goterms<-read.csv('~/Documents/genome_sciences_postdoc/Boyd mesocosm/DDA Oct 2017/multiple spp with phaeo/GO numbers.csv')

goterms.cha<-as.character(goterms) myCollection<-GOCollection(goterms.cha) slimsdf<-goSlim(myCollection, slim, 'BP')

kubu4 commented 3 years ago

One difference that I see is that your GO terms are stored as a vector instead of a list. I won't have time to test today, but if you want to try something sooner, I'd test out that change.

emmats commented 3 years ago

I tried this with the same results: goterms<-read.csv('~/Documents/genome_sciences_postdoc/Boyd mesocosm/DDA Oct 2017/multiple spp with phaeo/GO numbers.csv')

go.list<-as.list(goterms)

goterms.cha<-as.character(go.list) myCollection<-GOCollection(goterms.cha) slimsdf<-goSlim(myCollection, slim, 'BP')

kubu4 commented 3 years ago

Here's modified code that works:

slim<-getOBOCollection('/Library/Frameworks/R.framework/Versions/4.0/Resources/library/GSEABase/extdata/goslim_generic.obo')

goterms<-read.csv('~/Documents/genome_sciences_postdoc/Boyd mesocosm/DDA Oct 2017/multiple spp with phaeo/GO numbers.csv', header = FALSE)

goterms.cha<-as.character(goterms$V1)
myCollection<-GOCollection(goterms.cha)
slimsdf<-goSlim(myCollection, slim, 'BP')

The primary issue was getting the data read in as a data frame. To do that, needed to add the header = FALSE to the read.csv() function.

Once this was done, that allows you to generate the correctly formatted character object needed for the GOCollection function.

emmats commented 3 years ago

The GOCollection command still seems to not recognize that I am giving it a list of GO terms. I tried using read.table instead of read.csv and I tried removing the quote marks from the GO terms in go.list. Both yielded same (0) results. This is what myCollection looks like:

myCollection collectionType: GO ids: (0 total) evidenceCode: EXP IDA IPI IMP IGI IEP HTP HDA HMP HGI HEP ISS ISO ISA ISM IGC IBA IBD IKR IRD RCA TAS NAS IC ND IEA ontology: CC MF BP

Here is what goterms.cha looks like:

head(goterms.cha) [1] "c(\"GO:0006189\", \"GO:0006075\", \"GO:0019427\", \"GO:0007340\", \"GO:0003779\", \"GO:0030036\", \"GO:0007015\", \"GO:0007188\", \"GO:0004017\", \"GO:0006526\", \"GO:0009073\", \"GO:0006421\", \"GO:0006422\", \"GO:0005524\", \"GO:0006754\", \"GO:0046034\", \"GO:0015986\", \"GO:0016887\", \"GO:0016255\", \"GO:0009058\", \"GO:0005509\", \"GO:0005516\", \"GO:1901137\", \"GO:0005975\", \"GO:0019752\", \"GO:0003824\", \"GO:0051301\", \"GO:0034605\", \"GO:0034599\", \"GO:0034620\", \"GO:0015995\", \"GO:0009507\", \"GO:0009535\", \"GO:0006325\", \"GO:0030261\", \"GO:0072583\", \n\"GO:0090114\", \"GO:0006241\", \"GO:0006535\", \"GO:0008234\", \"GO:0018063\", \"GO:0005737\", \"GO:0031122\", \"GO:0002181\", \"GO:0002183\", \"GO:0005856\", \"GO:0022625\", \"GO:0019521\", \"GO:0006207\", \"GO:0015074\", \"GO:0006310\", \"GO:0022900\", \"GO:0005783\", \"GO:0006888\", \"GO:0071949\", \"GO:0033539\", \"GO:0006633\", \"GO:0004324\", \"GO:0050660\", \"GO:0010181\", \"GO:0001732\", \"GO:0006002\", \"GO:0007186\", \"GO:0009298\", \"GO:0019673\", \"GO:0006094\", \"GO:0006006\", \"GO:0006537\", \"GO:0006542\", \"GO:0019264\", \"GO:0006096\", \"GO:0005794\", \n\"GO:0005525\", \"GO:0003924\", \"GO:0020037\", \"GO:0042025\", \"GO:0016787\", \"GO:0030176\", \"GO:0016021\", \"GO:0006886\", \"GO:0009097\", \"GO:0070122\", \"GO:0009695\", \"GO:0016301\", \"GO:0005871\", \"GO:0006629\", \"GO:0009089\", \"GO:0006430\", \"GO:0000470\", \"GO:0016020\", \"GO:0046872\", \"GO:0006431\", \"GO:0032259\", \"GO:0007018\", \"GO:0007017\", \"GO:0070143\", \"GO:0042776\", \"GO:0005743\", \"GO:0005759\", \"GO:0005741\", \"GO:0070125\", \"GO:0005739\", \"GO:0031514\", \"GO:0006397\", \"GO:0016459\", \"GO:0051287\", \"GO:0016151\", \"GO:0006807\", \n\"GO:0003676\", \"GO:0006913\", \"GO:0009132\", \"GO:0009116\", \"GO:0000786\", \"GO:0005634\", \"GO:0006730\", \"GO:0055114\", \"GO:0016491\", \"GO:0006098\", \"GO:0003755\", \"GO:0018160\", \"GO:0005543\", \"GO:0009853\", \"GO:0015979\", \"GO:0009773\", \"GO:0009772\", \"GO:0010207\", \"GO:0006779\", \"GO:0043161\", \"GO:0030163\", \"GO:0006457\", \"GO:0006606\", \"GO:0042026\", \"GO:0004722\", \"GO:0015031\", \"GO:0016567\", \"GO:0018298\", \"GO:1902600\", \"GO:0006782\", \"GO:0042823\", \"GO:0006090\", \"GO:0019253\", \"GO:0042176\", \"GO:0000027\", \"GO:0000028\", \n\"GO:0005840\", \"GO:0042254\", \"GO:0003723\", \"GO:0004252\", \"GO:0007165\", \"GO:0015708\", \"GO:0008295\", \"GO:0000103\", \"GO:0006412\", \"GO:0006414\", \"GO:0006099\", \"GO:0030433\", \"GO:0006511\", \"GO:0006065\", \"GO:0016192\")"

kubu4 commented 3 years ago

Can you please post code you used?

emmats commented 3 years ago

slim<-getOBOCollection('/Library/Frameworks/R.framework/Versions/4.0/Resources/library/GSEABase/extdata/goslim_generic.obo')

map GO terms to GO slims and select Biological Processes group

goterms<-read.table('~/Documents/genome_sciences_postdoc/Boyd mesocosm/DDA Oct 2017/multiple spp with phaeo/GO numbers.csv', header=F)

go.list<-as.list(goterms) go.list<-noquote(go.list)

goterms.cha<-as.character(go.list) myCollection<-GOCollection(goterms.cha) slimsdf<-goSlim(myCollection, slim, 'BP')

kubu4 commented 3 years ago

Please try the code I posted previously:

slim<-getOBOCollection('/Library/Frameworks/R.framework/Versions/4.0/Resources/library/GSEABase/extdata/goslim_generic.obo')

goterms<-read.csv('~/Documents/genome_sciences_postdoc/Boyd mesocosm/DDA Oct 2017/multiple spp with phaeo/GO numbers.csv', header = FALSE)

goterms.cha<-as.character(goterms$V1)
myCollection<-GOCollection(goterms.cha)
slimsdf<-goSlim(myCollection, slim, 'BP')

Notice, there is no usage of the as.list() function.

emmats commented 3 years ago

Thanks! That did work. I didn't notice that change before.