BeeCSI-Microbiome / R_analyses

1 stars 0 forks source link

Update initial_processing.R #21

Closed Kurtj-hub closed 2 years ago

Kurtj-hub commented 2 years ago

Attempted Speed-up for the clade count procedure.

Rough Comparison on my system (No specific benchmarking for the Oxy dataset) old - 1-2 minutes. new - 2-3 seconds.

Output file slightly changed can be re-modified if required.

LLansing commented 2 years ago

Tested with var_2021. Got this warning upon calculate_clade_counts() call:

Warning message: In CheckNameReservedWord(name, check) : Name 'root' is a reserved word as defined in NODE_RESERVED_NAMES_CONST. Using 'root2' instead.

I compared the results of calculate_clade_counts() using the old and the updated versions. I believe the numbers are identical in across the tables, which is great. However, I see some odd naming results and a few other issues (all of the following is in regards to raw_clade output, raw_taxon seems identical):

image

Great work on the optimization! The values seem to be correct and the speed is incredibly quick. All that remains seems to be some formatting and edge cases. I suggest testing with a few other datasets (available from the DB access app) and comparing the outputs of the old and new calculate_clade_counts. Additionally, it's probably good to run some of the subsequent analyses using the clade table (e.g. ANCOM) to check if they're still working.

I will be gone for August so feel free to merge an update once you find the issues are addressed

Kurtj-hub commented 2 years ago

Awesome thanks @LLansing for getting this done before your break.

I appreciate the feedback I was uncertain about the relevance of the other columns as I thought they were mostly just artifacts. I will make the changes ASAP, and I hope you enjoy your break.