CAnBioNet / TkNA

9 stars 1 forks source link

Issue on the interface version of TkNA #82

Open catslu opened 1 month ago

catslu commented 1 month ago

Dear team, It is hard to install tkna in a conda environment, the infomap was cannot complie without any instruction like gcc version in infomap github page. Anyway, I'm using your interface version of TkNA, and followed the example datasets on github https://github.com/CAnBioNet/TkNA/tree/main/example_datasets_and_commands/microbiome_and_phenotype/input. Do I have to have two sets of experimental data? I had 10 samples from two group, with RNA-Seq and Metagenome data, no phenotype data. I intergrated these two type data like 'rbind', and prepard the group_map.csv and type_map.csv. Upload to the interface page, and it appeared to need me to set two Experiment Name. And I noticed that the files in toy network dataset, Experiment1 is same as Experiment2. It is correct? What should I do?

newmanno commented 1 month ago

Hello there, and thanks for your interest in TkNA! I'm so sorry for the late response. Have had issues logging into GitHub for a while, but they are fixed now.

As for your questions:

  1. Infomap is not essential for running the pipeline. We could not get the Infomap python module working, so if the user wants to run Infomap, the user has to instead install and compile Infomap on their system. With this being said, the Louvain method achieves similar clustering as Infomap and should not require any compilation, as it is a python module. So we recommend using that instead for users that have issues getting Infomap compiled.
  2. You should not need to have two sets of experimental data. We requested this to be fixed on the web interface at one point, but it appears that has not happened yet. In the meantime, I took a look at the toy network dataset, and you are right, it appears we mistakenly added two of the same files. However, that does not change anything about how the script is ran.

In your case, considering the webpage is not working properly (we will get that fixed ASAP), one thing you can do is to upload the same file twice for your two experiments, then only set an individual p-value threshold, and put the combined and FDR thresholds as 1 for both experiments. You should be able to do this for both the comparisons and correlations, assuming you are not using the "percent agreement" option, which would not be appropriate for your data anyways. That way, it should technically only find the parameters that are significant in the one experiment you have, as all the calculated combined p-values and FDRs will essentially be ignored. I believe this SHOULD generate an output for you. But then you would need to keep in mind that the comparisons and correlations are not FDR-corrected for your dataset and correct for those manually. The p-values for the comparisons can be found in the node_comparisons.csv file and the correlation p-values can be found in network_output_comp.csv. You will see two p-value columns, but they should be identical and the FDR correction can just be performed on one of them. Then, you will have to remove the nodes and edges in the network that do not pass the desired FDR you calculate.

Again, sorry for it not working properly! We will fix it ASAP. Let me know if you have any more questions.

-Nolan