Issue with setup_analysis in comparing2groups vignette

fgcz / prolfqua

Differential Expression Analysis tool box R lang package for omics data

https://pubs.acs.org/doi/pdf/10.1021/acs.jproteome.2c00441

MIT License

37 stars 7 forks source link

Issue with setup_analysis in comparing2groups vignette #68

Closed jrhougland closed 5 months ago

jrhougland commented 5 months ago

Hello,

I am attempting to follow the script outlined in the Comparing Two Groups with prolfqua vignette, however I keep receiving the error "Error in basename(data[[table$fileName]]) : a character vector argument expected" when running setup_analysis to create sampleName from fileName column. Do you have any advice on what the issue may be? Our data from MaxQuant should be in the same format as the one used in the example.

Thank you,

Juliana

jjGG commented 5 months ago

Hello Juliana,

Thanks for your comment here. Since the main developer of prolfqua is out of office for a while, I try to help you. From the listed error it looks to me that your raw-file names are the issue.

Are you starting the analysis with the proteinGroups.txt file or with some other input to prolfqua?

Could you maybe share the top 100 lines of your input?

Can you confirm that the vignette (comparing two groups) is running for you?

Best regards jonas

jrhougland commented 5 months ago

Hi Jonas,

Thank you for reaching out. I have attached a csv file which shows the first 100 lines of our data set after using the tidyMQ_ProteinGroups function. I believe the vignette is running, but is there a way I can confirm that? I did install the prolfqua package with the vignettes.

Thanks,

Juliana proteinGroups_ex.csv

jjGG commented 5 months ago

Hello Juliana,

Your "startdata" looks good after reading in with the tidyMQ_proteinGroup function.

Now at this point it is important to have the "annotation" file correct.

It should have the same tab-separated columns as in the example and very important, the raw.file names in the raw.file column have to match your sample names from the proteinGroups.txt e.g.: 10_pic01 11_bn50 12_bn57 13_bn58 14_bn60 15_bn59 16_bn54 17_bn53 18_bn55 19_bn56 20_bn43 22_pic02 23_bn48 24_bn49 25_bn52 26_bn51

In the Grouping or Condition column you would then specify the groups that you are comparing!

I hope this helps. best regards jonas

jrhougland commented 5 months ago

I believe the annotation file is correct, please find it attached here. Then, I tried to inner join the annotation and proteinGroups file by performing:

startdata <- inner_join(annot, data, by = c("sample" = "raw.file"))

All before the "adata" creation work, it is at this step where the error is coming:

adata <- setup_analysis(startdata, config)

I'm not sure what the issue is, because the raw.file is a character vector as it needs to be for the sampleName.

Thank you for your help,

Juliana

MQ_metadataALL2.xlsx

jjGG commented 5 months ago

Hello Juliana,

Ideally - You add a column that is the same in both tables. In your case the "raw.file" column is missing in the annotation file. Also "sample" column is recommended to have the labels! Be aware, that you will also "loose" all the files where you do not have group or visit or batch specified as later on when you will define the atable and specify the contrasts you need to concretely specify what you are going to compare with what.

best regards jonas

jrhougland commented 5 months ago

So the "sample" column in the annotation file should be renamed as "raw.file"? I did not have any issues with inner joining the files originally with different header names. Yes, I understand that we will lose data where there is no group or visit or batch identifier, which is okay for our purposes!

Thank you,

Juliana

jjGG commented 5 months ago

I did not test this yet. I guess I will try to look into it later today and send you a brief R-snipet that should work for your annotation and for the first lines of your startdata. I also understand that the inner_join should work here as you do it, still the SampleName column (or Name column) is missing in my opinion.

best regards jonas

jrhougland commented 5 months ago

Thank you Jonas, I will also try to rename the column with SampleName and see if that makes a difference.

Thanks,

Juliana

jjGG commented 5 months ago

Hello Juliana,

Here a little code snipet with innput that does the job on your sample data. Best regards jonas

prolfqua_githubIssues.zip

jrhougland commented 5 months ago

Thank you! This script worked and was very helpful.

jjGG commented 5 months ago

perfect. Thanks for trying and using prolfqua. If you struggle please open again an issue.