Generating local CAZy db locally rather than on a cluster

HobnobMancer / cazomevolve

`cazomevolve` ('cazome-evolve') investigates the evolution of CAZomes, and identifies CAZy families that co-occur within the genomes of candidate species, more frequently than would be expected by lineage.

https://hobnobmancer.github.io/cazomevolve/

MIT License

4 stars 1 forks source link

Generating local CAZy db locally rather than on a cluster #21

Closed PeterMBlack closed 3 months ago

PeterMBlack commented 5 months ago

Tried generating a local CAZy db locally on my mac (8GB RAM), rather than on a cluster with more memory. Half way through parsing the CAZy text file (51%) it maxes out my memory. Usage statistics suggested it was using 16GB memory (split between my acutal RAM and virtual memory)

HobnobMancer commented 5 months ago

Hi! This is an issue with cazy_webscraper so I've raised an issue over there.

Initial checks indicate this is due to a CAZy database release. The latest release generates a file of >4 million lines to be parsed to identify all unique proteins (identified by their NCBI protein version accession). I'll take a look at the cazy_webscraper code base and attempt to resource optimisation.

For progress check issue 123 at cazy_webscraper; I'll post here once I think the issue has been addressed.

HobnobMancer commented 3 months ago

This should be fixed with using the latest version of cazy_webscraper --> see the relevant issue in cazy_webscraper.