BaderLab / EnrichmentMapApp

The EnrichmentMap Cytoscape App allows you to visualize the results of gene-set enrichment as a network.
http://apps.cytoscape.org/apps/enrichmentmap
GNU Lesser General Public License v2.1
31 stars 12 forks source link

Issue with Parsing .txt or .xls Files in EnrichmentMap: "For input string: 'I<' Error" #543

Closed astrum1337 closed 4 months ago

astrum1337 commented 4 months ago

Hello!

I've been experiencing a recurring issue when trying to build an enrichment map in Cytoscape using the EnrichmentMap plugin. Despite following the plugin documentation and ensuring my input files (.txt format) conform to the required structure, I keep encountering the following error during the parsing stage:

Building Enrichment Map Parsing Generic Result file Error: For input string: "I<"

Analysis Type: GSEA Input Files: Uploaded as .txt format, containing columns for "ID", "Description", "SIZE", "ES", "NES", "NOM p-val", "FDR q-val", "FWER p-val", "RANK AT MAX", and "LEADING EDGE". The first column, "ID", contains Reactome pathway IDs. Additional File: A .gmt file corresponding to Reactome pathways, obtained from a reputable source and confirmed to be in the correct format. Settings in EnrichmentMap: Analysis type set to GSEA, FDR cut-off value at 0.05, and other default settings for dataset edges and connectivity.

I have checked the .txt files for any formatting errors or unexpected characters, especially around the areas where the error might be originating. Ensured that the file encoding is UTF-8 without BOM. Simplified the dataset to test if the issue persists with a smaller set of data. Tried using test sample data known to work with EnrichmentMap, yet encountered the same error.

Could you please provide guidance on how to resolve this error? Is there a specific format or encoding requirement for the .txt files that I might be overlooking? Any assistance or insights you could offer would be greatly appreciated.

Thank you for your time and help!

kindest, ermir

risserlin commented 4 months ago

Hi Ermir, I have never seen that error before. It is weird that you are specifying GSEA input and giving it GSEA input but yet the error message implies that it is parsing a generic file instead of a GSEA file. Would you be able to attach an example file of your input files soI can try and recreate the issue. Thanks, Ruth

risserlin commented 4 months ago

Hi Ermir, There a few issues with your data files. Where are you running your enrichment analysis? It might be modifying some of the names of your sets inadvertently.

Your .txt enrichment files, The first column of the results needs to match perfectly with the first column of your genesets file (gmt). Currently your enrichment results files have a substring of the geneset identifier (I think the last portion containing the id) instead of the full name.

For example: From this : <Screen Shot 2024-03-08 at 11.09.05 AM.png>

It need to look like this: (I just moved the id over to the third column as GSEA is expecting a column there called GS details that EM generally ignores

<Screen Shot 2024-03-08 at 11.10.56 AM.png>

But this still won’t fix it because it looks like the names in the enrichment file have been shortened as well because the above geneset actually has an extra .versionnumber in the gmt file:

<Screen Shot 2024-03-08 at 11.12.28 AM.png>

So after you fix the issue with the enrichment file your network will fail to build because it won’t be able to find the geneset in the gmt file.

What program are you running to create your enrichment results?

Thanks, Ruth

risserlin commented 4 months ago

notes - Call from R for enrichments - library(ReactomePA) fgsea_react <- gsePathway(geneList = geneList, organism = "rat", minGSSize = 15, maxGSSize = 500, eps = 0, nPermSimple = 10000, seed = TRUE)`

Using - baderlab genesets for Rat or from Reactome but neither of them have the identifier parsed out into the first column.

File format not compatible with EM