RobertsLab / resources

https://robertslab.github.io/resources/
18 stars 10 forks source link

GO-MWU (enrichment tool) error with 2019 crab data #907

Closed grace-ac closed 4 years ago

grace-ac commented 4 years ago

Assigning @yaaminiv because we used GO-MWU in class at FHL last summer, but anyone can jump in

Using GO-MWU (GitHub repo for GO-MWU) to get gene enrichment for crab 2019 differentially expressed genes - comparing infected and uninfected crabs.

Crab repo directory for using GO-MWU: here

Some blurbs about GO-MWU from their GitHub repo: Rank-based Gene Ontology Analysis with Adaptive Clustering

"tests whether the genes belonging to a certain GO category are significantly bunched up near the top or the bottom of the global ranked list of genes, instead of being spread evenly over it." "major advantage of this approach is that the experimenter does not have to impose an arbitrary threshold for initial selection of "significant genes", and thus the whole dataset can be used to gain information"

GO-MWU needs several files, 2 of which are made of crab data:

  1. list of gene IDs with GO terms as a .tab file
  2. list of gene IDs with a measure of interests (I did log2FC) as a .csv file

The files that I made from those specifications:

  1. 2019-crab-GO-annot.tab
  2. 2019-crab-GO-log2fc.csv

The other files are in the GO-MWU directory in project-crab/analyses/GO-MWU and were copied from GO-MWU GitHub repo, as per their README.md specifications.

Script to use GO-MWU and my data files: 2019-crab-infection-GO_MWU.R Copied from the .R script from GO-MWU: GO_MWU.R

Error that I keep getting:

Error in read.table(inname, sep = "\t", header = T, check.names = F) : no lines available in input

The "input" file is the 2019-crab-GO-log2fc.csv as assigned in the script, so I'm thinking that that may be where something is going wrong...

Notes: I have installed perl on my computer and I am pretty confident that the path to perl in my script is correct.

I have also compared my 2019-crab-GO-log2fc.csv to the one used in the EIMD class (2019-07-11-Zostera-Table-of-Significance-Measures.csv), and it looks the same...

Any ideas on how to fix this error? Or what this error means? Other things I'm doing incorrectly?

yaaminiv commented 4 years ago

@grace-ac Can you share the script you used to make these files?

grace-ac commented 4 years ago

https://github.com/RobertsLab/project-crab/blob/master/scripts/041720-DEGlist_annotate-enrich.Rmd

yaaminiv commented 4 years ago

The error you keep getting is referring to a tab-delimited file:

Error in read.table(inname, sep = "\t", header = T, check.names = F) : no lines available in input

Which would be this: https://github.com/RobertsLab/project-crab/blob/master/analyses/GO-MWU/2019-crab-GO-annot.tab

Looking at that file, you still have quotes around each entry:

Screen Shot 2020-04-20 at 11 47 38 AM

Try adding quote = FALSE to your write.table code:

write.table(crab_gene_go, "../analyses/GO-MWU/2019-crab-GO-annot.tab", row.names = FALSE, quote = FALSE)

grace-ac commented 4 years ago

thanks! I'll try that

grace-ac commented 4 years ago

dang. retried code with new 2019-crab-GO-annot.tab and got the same error.

yaaminiv commented 4 years ago

@grace-ac The tab-delimited file shouldn't have a header either!

grace-ac commented 4 years ago

ooh i'll try that!

yaaminiv commented 4 years ago

From GO-MWU repo:

Screen Shot 2020-04-20 at 12 03 49 PM
grace-ac commented 4 years ago

New file without quotes or header: https://github.com/RobertsLab/project-crab/blob/master/analyses/GO-MWU/2019-crab-GO-annot.tab

still same error... maybe it has something to do with the "NA" in some of my rows...

grace-ac commented 4 years ago

or maybe something with the fact that my list has isoforms...

grace-ac commented 4 years ago

just realized i never specified "sep = '\t'" when i wrote out .tab file. will see if that makes difference now

yaaminiv commented 4 years ago

or maybe something with the fact that my list has isoforms...

I don't think this is isoform related. The error is claiming there are no lines available in the input tab-delimited file. Are you sure it is tab-delimited?

yaaminiv commented 4 years ago

just realized i never specified "sep = '\t'" when i wrote out .tab file. will see if that makes difference now

Yup that was going to be my next question. That would definitely make a difference!

grace-ac commented 4 years ago

new file: https://github.com/RobertsLab/project-crab/blob/master/analyses/GO-MWU/2019-crab-GO-annot.tab

write.table code:
write.table(crab_gene_go, "../analyses/GO-MWU/2019-crab-GO-annot.tab", sep = "\t", row.names = FALSE, quote = FALSE, col.names = FALSE)

still same error.

Gave me option to show error traceback.

Here is traceback:

Error in read.table(inname, sep = "\t", header = T, check.names = F) : no lines available in input

  1. stop("no lines available in input")
  2. read.table(inname, sep = "\t", header = T, check.names = F) at gomwu.functions.R#5
  3. clusteringGOs(goAnnotations, goDivision, clusterCutHeight) at gomwu.functions.R#21
  4. gomwuStats(input, goDatabase, goAnnotations, goDivision, perlPath = "../../../../../../../usr/bin/perl", largest = 0.1, smallest = 5, clusterCutHeight = 0.25, )
yaaminiv commented 4 years ago

Are you sure it's tab-delimited? It doesn't "look" right in Github...

Screen Shot 2020-04-20 at 12 37 26 PM

Can you do head and post a screenshot here just to confirm that it is tab-delimited?

Also, you need to remove quotes (quote = FALSE) from the .csv file. If you look at the raw format, you can see that there are quotes around the entries:

Screen Shot 2020-04-20 at 12 40 31 PM
grace-ac commented 4 years ago
  1. I totally agree - i noticed those weird rows when I looked at it in GitHub... not sure what's going on there.

head command wasn't super useful in Rmd: Screen Shot 2020-04-20 at 1 31 44 PM (2) Screen Shot 2020-04-20 at 1 30 28 PM (2)

Here's head in terminal of same file (some rows still are weird):
Screen Shot 2020-04-20 at 1 32 20 PM

  1. fixed! wasn't looking at "raw" version of file, so I missed the quotes- thanks!
yaaminiv commented 4 years ago

Here's head in terminal of same file (some rows still are weird):

Interesting...do you still get the same error when you use GO-MWU with the fixed version of the .csv file?

grace-ac commented 4 years ago

um wow. the quotes around the .csv file must have been the issue. IT WORKS! gah thank you so much for helping! you get lots of points for all the commenting you did today ha

grace-ac commented 4 years ago

actually the plotting part of the script came up with an error: Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length

I'm working on figuring out what that's all about right now.

grace-ac commented 4 years ago

Turns out it's because there were no significantly enriched GO terms. thanks for all the help yaaminI!

kubu4 commented 4 years ago

quotes around the .csv file

My comment here is a bit off topic, but it's crucial for working with output files generated by R. There is an option in thewrite.csv() function to turn off quoting. Be sure to use that option every time. Otherwise, all the CSV outputs are an incredible headache to work with.

magnew1 commented 7 months ago

Hi @grace-ac @yaaminiv ! I'm popping on this thread because I'm having the same problem as Grace. I am positive I have no quotes in my .csv files and my .tab file is tab delimited - I can't figure out why it keeps giving me this error (same one Grace mentioned)?

Error in read.table(inname, sep = "\t", header = T, check.names = F) : no lines available in input 4. stop("no lines available in input") 3. read.table(inname, sep = "\t", header = T, check.names = F) at gomwu.functions.R#5 2. clusteringGOs(goAnnotations, goDivision, clusterCutHeight) at gomwu.functions.R#27 1. gomwuStats(input, goDatabase, goAnnotations, goDivision, perlPath = "perl", largest = 0.1, smallest = 5, clusterCutHeight = 0.25, )

My .tab file is here: https://github.com/magnew1/OsHV1-MWU/blob/main/CG_Gene_MWU.tab Script for MWU analysis from the github page is here: https://github.com/magnew1/OsHV1-MWU/blob/main/GO_MWU.R Script for how I made the files are here: https://github.com/magnew1/OsHV1-MWU/blob/main/github_edgeR_GO_.R

Any ideas/help is appreciated!

kubu4 commented 7 months ago

We'll be happy to help, but need a bit more info:

  1. Can you please clarify which line(s) of code are throwing the error?
  2. Are you able to upload all the necessary files to run your code into GitHub? Specifically, I'd also like to see what this file looks like: fam1_CONvEXP_MWU.csv

Without all of the necessary files to run you code, it's a bit difficult to troubleshoot.

magnew1 commented 7 months ago

Yes!

  1. In the GO_MWU.R file it's line 31 gomwuStats function that is giving me that error
  2. Yes! You can now find that file at https://github.com/magnew1/OsHV1-MWU/blob/main/fam1_CONvEXP_MWU.csv

It is a csv containing gene name and -log(pvalues) as described in the GO_MWU.R file

The rest of the files are from the GO MWU Github Repository

kubu4 commented 7 months ago

I'd remove the header line (gene,signPval) from https://github.com/magnew1/OsHV1-MWU/blob/main/fam1_CONvEXP_MWU.csv and try to run gomwuStats() again.

magnew1 commented 7 months ago

I removed the first line in the .csv file and receive the same error with gomwuStats().

Additionally, the example .csv file from the original Github repository has a header, so I'm guessing that's not the issue (unless I'm missing something?). Photo of the example .csv attached.

Screen Shot 2024-02-09 at 10 34 42 AM
kubu4 commented 7 months ago

Looking at the GO-MWU documentation, it says the following:

  1. Put all this into the same directory:

    scripts: GO_MWU.R, gomwu_a.pl, gomwu_b.pl, gomwu.functions.RGO_MWU.R, gomwu_a.pl, gomwu_b.pl, gomwu.functions.R

Those scripts are missing from the GitHub repo you've shared. Is that the possible source of the issue?

If you add all the necessary files indicated by the GO-MWU documentation to your GitHub repo, we'll be happy to keep helping.

magnew1 commented 7 months ago

I had them all in my directory but your comment gave me the idea to re-download all of the files and re-run, and it worked! I must have had a glitch or downloaded one of them wrong from the beginning. Thank you for all of your help, and patience (I am new to github, I appreciate your advice!).

kubu4 commented 7 months ago

Alright! Nice work!