federicomarini / pcaExplorer

pcaExplorer - Interactive exploration of Principal Components of Samples and Genes in RNA-seq data
https://federicomarini.github.io/pcaExplorer/
Other
54 stars 17 forks source link

Countmatrix and sample metadata table not uploading properly #16

Open Justin1609 opened 2 years ago

Justin1609 commented 2 years ago

Hi there

I am trying to upload my own countmatrix and sample metadata table using the interactive version of the tool, but it doesn't seem to be reading my input tables correctly. I made my tables in excel and modeled it according to the "airway" demo data. I saved the excel file as csv file but it just doesn't seem to want to work. I urgently need to plot this data, any help would be greatly appreciated. I can send you my countmatrix and sample metadata table on request.

Many thanks

J

federicomarini commented 2 years ago

Hi @Justin1609, did you try to see if the files are in the csv format - despite of the extension, sometimes Excel might not really be comma-delimited. You can do so by opening these files in any text editor. If that works: try to read them in offline (before calling the app), and call the app by specifying the count matrix and the metadata table in the respective parameters. HTH, Federico

Justin1609 commented 2 years ago

Hi there Federico

Thanks so much, I managed to get it sorted out. I didn't realize that you don't have to transpose the count matrix before inputting it. Why is it that you don't transpose the data in your tool? By transpose I mean having samples as objects and genes as variables. I am doing PCA on counts data from RNA seq analysis for Saccharomyces cerevisiae. Would the output look different if the count matrix was transposed as I described? I am also having issues with the gene annotation file for S. cerevisiae as there is no entry for this on the normal databases that are used in the examples for using pcaExplorer. Do you maybe know of another database I can use for the annotations of S. cerevisiae? I have a CSV file where I have the Gene IDs of S.cerevisiae in column 1 and then the Standard gene names for each Gene ID in the second column. If you could help out with these issues I would really appreciate it.

Kind regards

Justin

On Wed, Nov 24, 2021 at 1:47 AM Federico Marini @.***> wrote:

Hi @Justin1609 https://github.com/Justin1609, did you try to see if the files are in the csv format - despite of the extension, sometimes Excel might not really be comma-delimited. You can do so by opening these files in any text editor. If that works: try to read them in offline (before calling the app), and call the app by specifying the count matrix and the metadata table in the respective parameters. HTH, Federico

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/federicomarini/pcaExplorer/issues/16#issuecomment-977291115, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOVIGQVZGYTPWTZE4KOFD53UNQRZ5ANCNFSM5IN52KLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Justin1609 commented 2 years ago

Hi Federico

Could you also please tell me how I can edit the title and the legend name of the plot? And I would like to remove the sample labels from the plot and change the color scheme of the different sample groups? I tried to do this in R but pcaplot doesn't generate the correct object that ggplot requires to be able to edit these details.

Kind regards

Justin

On Wed, Nov 24, 2021 at 1:47 AM Federico Marini @.***> wrote:

Hi @Justin1609 https://github.com/Justin1609, did you try to see if the files are in the csv format - despite of the extension, sometimes Excel might not really be comma-delimited. You can do so by opening these files in any text editor. If that works: try to read them in offline (before calling the app), and call the app by specifying the count matrix and the metadata table in the respective parameters. HTH, Federico

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/federicomarini/pcaExplorer/issues/16#issuecomment-977291115, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOVIGQVZGYTPWTZE4KOFD53UNQRZ5ANCNFSM5IN52KLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

federicomarini commented 2 years ago

Hi there Federico Thanks so much, I managed to get it sorted out. I didn't realize that you don't have to transpose the count matrix before inputting it. Why is it that you don't transpose the data in your tool? By transpose I mean having samples as objects and genes as variables.

Well, the reason is more like "historical" - in Bioinformatics, it is more common to see genes as features on the rows and samples on the columns. So I sticked to the "classical" version. Yes, an even more classical biostatistics-tailored view would be indeed the transposed one. But hey... 🤷

I am doing PCA on counts data from RNA seq analysis for Saccharomyces cerevisiae. Would the output look different if the count matrix was transposed as I described?

Sure - stick to the expected format, and it will be fine.

I am also having issues with the gene annotation file for S. cerevisiae as there is no entry for this on the normal databases that are used in the examples for using pcaExplorer. Do you maybe know of another database I can use for the annotations of S. cerevisiae? I have a CSV file where I have the Gene IDs of S.cerevisiae in column 1 and then the Standard gene names for each Gene ID in the second column. If you could help out with these issues I would really appreciate it. Kind regards Justin

Not so much experience on yeast TBH - some annotation packages are available in Bioconductor, have a look at org.Sc.sgd.db

Federico

federicomarini commented 2 years ago

Hi Federico Could you also please tell me how I can edit the title and the legend name of the plot? And I would like to remove the sample labels from the plot and change the color scheme of the different sample groups? I tried to do this in R but pcaplot doesn't generate the correct object that ggplot requires to be able to edit these details. Kind regards Justin

I guess for these types of request, probably you are best served by building the ggplot object from the scratch. Actually, the object returned by pcaplot are ggplot objects, so the customization should actually work. If you are a little familiar with code, feel free to use the source and adapt it to your needs. HTH, Federico

Justin1609 commented 2 years ago

Thanks very much @federicomarini/pcaExplorer @.***> I really appreciate it. I did have a chat with a co-supervisor of mine who is a Biostatistician, but why would he have suggested that I transpose the count matrix data? Apologies, I am very new to realm of Bioinformatics so I would just like to understand how it would affect the PCA output, if at all? For example, would the sample view in your program look different if the data were transposed? Why would it need to be transposed? Is there any difference between transposing the data versus the format that you use?

Thanks I will definitely check the database out in R.

Regarding editing the plot I receive an error when trying to insert pcaplot object as the required "pcobj" for ggplot. I am not too familiar with the coding side either, could you maybe direct me to resources for how to go about altering the code, like that which is available on the user guide for pcaplot, where I could for example change the colours of the circles and edit the legend name and the title? This would be extremely helpful.

Finally, I realise this is not the intended purpose of the program but I was wondering if you could perhaps give me some advice on how to perform an OPLS analysis, perhaps something that is as user friendly as your program and uses a similar input and method? I have tried the ropls package in R, but I honestly cannot seem to figure it out too well. If you have any advice I would truly appreciate it.

Kind regards

Justin

On Wed, Nov 24, 2021 at 3:08 PM Federico Marini @.***> wrote:

Hi there Federico Thanks so much, I managed to get it sorted out. I didn't realize that you don't have to transpose the count matrix before inputting it. Why is it that you don't transpose the data in your tool? By transpose I mean having samples as objects and genes as variables.

Well, the reason is more like "historical" - in Bioinformatics, it is more common to see genes as features on the rows and samples on the columns. So I sticked to the "classical" version. Yes, an even more classical biostatistics-tailored view would be indeed the transposed one. But hey... 🤷

I am doing PCA on counts data from RNA seq analysis for Saccharomyces cerevisiae. Would the output look different if the count matrix was transposed as I described?

Sure - stick to the expected format, and it will be fine.

I am also having issues with the gene annotation file for S. cerevisiae as there is no entry for this on the normal databases that are used in the examples for using pcaExplorer. Do you maybe know of another database I can use for the annotations of S. cerevisiae? I have a CSV file where I have the Gene IDs of S.cerevisiae in column 1 and then the Standard gene names for each Gene ID in the second column. If you could help out with these issues I would really appreciate it. Kind regards Justin

Not so much experience on yeast TBH - some annotation packages are available in Bioconductor, have a look at org.Sc.sgd.db https://bioconductor.org/packages/release/data/annotation/html/org.Sc.sgd.db.html

Federico

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/federicomarini/pcaExplorer/issues/16#issuecomment-977861689, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOVIGQWUPZTYWJSIVSMCVBLUNTPTTANCNFSM5IN52KLQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

federicomarini commented 2 years ago

No problem, I am aware we in bioinformatics are doing things by default in a transposed way 😉

If you transpose it: well, in the end you do change the point of view on it: so, no more samples as linear combinations of the genes but the other way around!

For editing the ggplot object: I would say some generic resource like a tutorial on ggplot would do it, I have at the moment none I can recommend, do check out https://datavizm20.classes.andrewheiss.com/, I used to recommend it for many other reasons!

If you want to do an OPLS analysis, this is out of pcaExplorer's business, "per se", but very much in the whole dimensionality reduction business. Do have a look at Holmes & Huber MSMB book, available online. IIRC it had a couple of these alternatives to PCA introduced.

Federico