gavinmdouglas / picrust2_manuscript

20 stars 7 forks source link

Further questions regarding the categorize_by_function.py for PICRUSt2 #2

Closed mikelgutmut closed 2 years ago

mikelgutmut commented 3 years ago

Good afternoon, I am writing in relation to some further issues with the process described in the issue from zina-R (PiCrust2 categorize_by_function.py #1). I've followed the steps involving the R function as indicated and I was successfully able to reproduce them. As a result, I got three separate datasets, each one belonging to a different KO level. However, I intend to use my dataset for further analysis via LEfSe for functional biomarker analysis and for this I would like to have the information of the 3 levels all in the same data table to later create a cladogram (with taxonomic data you can include different hierarchical taxonomical levels separated by | and this is what I am trying to do with the functional data). Could anyone help me on solving this? Does anyone know if there is a way to get this final output? Thanks in advance, Mikel.

gavinmdouglas commented 3 years ago

Hi @mikelgutmut,

Sorry, I don't have code to do that specifically, but all of the levels could be parsed in a similar way as in the R code referred to in that issue you linked.

marwa38 commented 2 years ago

Hi @mikelgutmut

Have you managed to do it at different levels? Cheers

mikelgutmut commented 2 years ago

Hi marwa38,

If I remember properly I ended up doing the separation of the levels in RStudio. Couldn't make it with PICRUSt itself.

Cheers,

Mikel.

El vie, 11 feb 2022 a las 9:30, marwa38 @.***>) escribió:

Hi @mikelgutmut https://github.com/mikelgutmut

Have you managed to do it at different levels? Cheers

— Reply to this email directly, view it on GitHub https://github.com/gavinmdouglas/picrust2_manuscript/issues/2#issuecomment-1035976736, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATHWEZIYT36JF7UBLXUTRV3U2TCLVANCNFSM4ZTM6X2Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

25-5-1992 commented 2 years ago

Hi @gavinmdouglas I am trying to do categorize_by_function for picrust2 output in R. Is there any other dependencies which need to be installed in R? I am getting the following error: test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map) Error in categorize_by_function_l3(test_ko, kegg_brite_map) : could not find function "categorize_by_function_l3"

gavinmdouglas commented 2 years ago

The script I provided is all that is needed - but you need to define the categorize_by_function_l3 function first (i.e., run that code first). Is it present in the file you're working with?

25-5-1992 commented 2 years ago

The script I provided is all that is needed - but you need to define the categorize_by_function_l3 function first (i.e., run that code first). Is it present in the file you're working with?

Hi @gavinmdouglas There was issue with the "categorize_by_function_l3". Now I have successfully run the codes and created the table. Is there a way we can create similar table at level 1 and level 2.

gavinmdouglas commented 2 years ago

It's been a while since I've looked at that script, but yes I believe you would just need to change the function to regroup to a different column (the 1st or 2nd) rather than the 3rd.

Cheers,

Gavin

25-5-1992 commented 2 years ago

It's been a while since I've looked at that script, but yes I believe you would just need to change the function to regroup to a different column (the 1st or 2nd) rather than the 3rd. Hi @gavinmdouglas It would be great if you can highlight the positions in the script which needs to be changed for regrouping.

25-5-1992 commented 2 years ago

Hi @mikelgutmut I have successfully created table at level 3 by running "categorize_by_function_l3". Can you please share the code for creating table at the other two levels. And how you merged the information of the three levels in a single dataset?

gavinmdouglas commented 2 years ago

That's not a script I maintain as part of PICRUSt2, but I think you would need to change pathway <- strsplit(pathway, ";")[[1]][3] to be pathway <- strsplit(pathway, ";")[[1]][1] for instance to get the first level of KEGG BRITE. You could try that and see if the output made sense.

Cheers,

Gavin

25-5-1992 commented 2 years ago

Thank you @gavinmdouglas Changing pathway <- strsplit(pathway, ";")[[1]][3] to be pathway <- strsplit(pathway, ";")[[1]][1] worked. I successfully created table for level 1 and level 2. Is there a way the information of the three different levels can be put in one table?

gavinmdouglas commented 2 years ago

Great! I don't have a script handy to do that, so you would need to use custom code to do that.

Cheers,

Gavin

25-5-1992 commented 2 years ago

Thanks @gavinmdouglas. It would be great if you can help in doing that, as I am not much into bioinformatics.

gavinmdouglas commented 2 years ago

For sure, if you have specific R code you need help with I would be happy to give feedback.

Cheers,

Gavin

25-5-1992 commented 2 years ago

For sure, if you have specific R code you need help with I would be happy to give feedback.

Cheers,

Gavin

Thanks Gavin.

Nisa435 commented 1 year ago

Hi Gavin, I am trying to run the R codes for categorize_by_function. Below are the details and errors I am getting. kegg_brite_map <- read.table("picrust1_KO_BRITE_map.tsv", header=TRUE, sep="\t", quote = "", stringsAsFactors = FALSE, comment.char="", row.names=1)

test_ko <- read.table("KO_out.tsv", header=TRUE, sep="\t", row.names=1)

categorize_by_function_l3 <- function(test_ko, kegg_brite_mapping)

test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)

if(length(which(colnames(test_ko) == "KEGG_Pathways") > 0)) { test_ko <- test_ko[, -which(colnames(test_ko) == "KEGG_Pathways")] } test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ]

Error: object 'test_ko_L3' not found

Thank you in advance

gavinmdouglas commented 1 year ago

Hi @Nisa435,

I should preface that this is just example R code and isn't something I officially maintain.

However, test_ko_L3 is being assigned at this step: test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map), so what error is given when you run that code?

Thanks,

Gavin

Nisa435 commented 1 year ago

Hi Gavin, Yes, it is assigned. When I run test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ], this is the error I am getting.

Error: object 'test_ko_L3' not found

gavinmdouglas commented 1 year ago

Hi again,

There must be an error earlier that is causing that variable to not be correctly assigned, otherwise the variable would be found. There should be an error message after that aforementioned command to assign it, is there not?

Thanks,

Gavin

On Mon., Jan. 30, 2023, 10:09 a.m. Nisa435, @.***> wrote:

Hi Gavin, Yes, it is assigned. When I run test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ], this is the error I am getting.

Error: object 'test_ko_L3' not found

— Reply to this email directly, view it on GitHub https://github.com/gavinmdouglas/picrust2_manuscript/issues/2#issuecomment-1408797401, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7JHU73MDLXCZPZC5G3CJLWU7KTVANCNFSM4ZTM6X2Q . You are receiving this because you were mentioned.Message ID: @.***>

Nisa435 commented 1 year ago

Hello, There is no error apart from that. Or I might be missing something.

kegg_brite_map <- read.table("picrust1_KO_BRITE_map.tsv", header=TRUE, sep="\t", quote = "", stringsAsFactors = FALSE, comment.char="", row.names=1) test_ko <- read.table("KO_out.tsv", header=TRUE, sep="\t", row.names=1) categorize_by_function_l3 <- function(test_ko, kegg_brite_mapping)

  • test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)
    if(length(which(colnames(test_ko) == "KEGG_Pathways") > 0)) { test_ko <- test_ko[, -which(colnames(test_ko) == "KEGG_Pathways")]
  • } test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ] Error: object 'test_ko_L3' not found orig_ko_L3 <- read.table("test_ko_L3.tsv", header=TRUE, sep="\t", row.names=1, skip=1, comment.char="", quote="") orig_ko_L3 <- orig_ko_L3[, -which(colnames(orig_ko_L3) == "KEGG_Pathways")] orig_ko_L3 <- orig_ko_L3[-which(rowSums(orig_ko_L3) == 0),] identical(test_ko_L3_sorted, orig_ko_L3) Error in identical(test_ko_L3_sorted, orig_ko_L3) : object 'test_ko_L3_sorted' not found
gavinmdouglas commented 1 year ago

You can run one command at a time and then look at the output object (with "head" for example) to check that objects are being created correctly.

On Mon., Jan. 30, 2023, 12:47 p.m. Nisa435, @.***> wrote:

Hello, There is no error apart from that. Or I might be missing something.

kegg_brite_map <- read.table("picrust1_KO_BRITE_map.tsv", header=TRUE, sep="\t", quote = "", stringsAsFactors = FALSE, comment.char="", row.names=1) test_ko <- read.table("KO_out.tsv", header=TRUE, sep="\t", row.names=1) categorize_by_function_l3 <- function(test_ko, kegg_brite_mapping)

  • test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)

if(length(which(colnames(test_ko) == "KEGG_Pathways") > 0)) { test_ko <- test_ko[, -which(colnames(test_ko) == "KEGG_Pathways")]

  • }

test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ] Error: object 'test_ko_L3' not found orig_ko_L3 <- read.table("test_ko_L3.tsv", header=TRUE, sep="\t", row.names=1, skip=1, comment.char="", quote="") orig_ko_L3 <- orig_ko_L3[, -which(colnames(orig_ko_L3) == "KEGG_Pathways")] orig_ko_L3 <- orig_ko_L3[-which(rowSums(orig_ko_L3) == 0),] identical(test_ko_L3_sorted, orig_ko_L3) Error in identical(test_ko_L3_sorted, orig_ko_L3) : object 'test_ko_L3_sorted' not found

— Reply to this email directly, view it on GitHub https://github.com/gavinmdouglas/picrust2_manuscript/issues/2#issuecomment-1409055042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC7JHU634LIMRVDTKRWGAVDWU75EBANCNFSM4ZTM6X2Q . You are receiving this because you were mentioned.Message ID: @.***>

Nisa435 commented 1 year ago

Kegg_brite map and test_ko read table are good. However, after running the test_ko_L3 got the below details.

test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map)

head(test_ko_L3,n=10)

1 function (test_ko, kegg_brite_mapping) 2 head(categorize_by_function_l3)

gavinmdouglas commented 1 year ago

Hey again,

This is a very odd error - it seems like test_ko_L3 is being assigned the value of the function categorize_by_function_l3 itself rather than as the output of that tool. Can you confirm that the function categorize_by_function_l3 was already defined prior to running these commands?

Gavin

Nisa435 commented 1 year ago

Hi Gavin, Sure.

kegg_brite_map <- read.table("picrust1_KO_BRITE_map.tsv", header=TRUE, sep="\t", quote = "", stringsAsFactors = FALSE, comment.char="", row.names=1) test_ko <- read.table("KO_out.tsv", header=TRUE, sep="\t", row.names=1) categorize_by_function_l3 <- function(test_ko, kegg_brite_mapping)

  • test_ko_L3 <- categorize_by_function_l3(test_ko, kegg_brite_map) if(length(which(colnames(test_ko) == "KEGG_Pathways") > 0)) { test_ko <- test_ko[, -which(colnames(test_ko) == "KEGG_Pathways")]
  • } test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ] Error: object 'test_ko_L3' not found
EJS01 commented 1 month ago

Hi! I am now done with this command but am unable to continue forward: pathway_pipeline.py -i KO_metagenome_out/pred_metagenome_contrib.tsv.gz -o KEGG_pathways_out --no_regroup --map picrust2/picrust2/default_files/pathway_mapfiles/KEGG_pathways_to_KO.tsv

running test_ko_L3 results always into this error: Error in aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : no rows to aggregate

Hoping to hear from you!

gavinmdouglas commented 1 month ago

Hi @EJS01,

That error makes it sound like the input file is empty (or at least none of the KEGG ortholog IDs intersect between the table and the mapfile). I'm not sure what the issue is, but just so you know this is the GitHub repo for the manuscript code (i.e., code used for running statistical analyses in the paper, and not the codebase itself). You can find the actual codebase here: https://github.com/picrust/picrust2.

Cheers,

Gavin

EJS01 commented 1 month ago

Hi Gavin,

The other commands are now working, and I am able to see the level 3. However, typing this command again points out another error:

test_ko_L3_sorted <- test_ko_L3[rownames(orig_ko_L3), ] Error: object 'orig_ko_L3' not found

Likewise, is there a code for us to determine the other levels (e.g., levels 1,2) as well?

gavinmdouglas commented 1 month ago

Hi @EJS01,

Those are just rough example R commands, rather than actual commands that should be run. If you read in the orig_ko_L3 table first you wouldn't get that error. That R code is just an example of how to regroup a table in R, rather than an official maintained function. However, you could test changing the "pathway <- strsplit(pathway, ";")[[1]][3]" line to regroup to different levels. It's been a long time since I've looked at those tables, but I believe taking the first or second element rather than the third would let you regroup to different KEGG levels.

Cheers,

Gavin

EJS01 commented 1 month ago

this is now working. thank you!

On Mon, 29 Jul 2024 at 10:37, Gavin Douglas @.***> wrote:

Hi @EJS01 https://github.com/EJS01,

Those are just rough example R commands, rather than actual commands that should be run. If you read in the orig_ko_L3 table first you wouldn't get that error. That R code is just an example of how to regroup a table in R, rather than an official maintained function. However, you could test changing the "pathway <- strsplit(pathway, ";")[[1]][3]" line to regroup to different levels. It's been a long time since I've looked at those tables, but I believe taking the first or second element rather than the third would let you regroup to different KEGG levels.

Cheers,

Gavin

— Reply to this email directly, view it on GitHub https://github.com/gavinmdouglas/picrust2_manuscript/issues/2#issuecomment-2254828772, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2HA5V62KPDZDNODHLWY7GLZOWTFBAVCNFSM6AAAAABLPYJ2LGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUHAZDQNZXGI . You are receiving this because you were mentioned.Message ID: @.***>