PoonLab / covizu

Rapid analysis and visualization of coronavirus genome variation
https://filogeneti.ca/CoVizu/
MIT License
45 stars 20 forks source link

Enhance apply_features function for more efficiency #488

Closed SandeepThokala closed 11 months ago

SandeepThokala commented 1 year ago

https://github.com/PoonLab/covizu/blob/4ab1ba45b919a45d974c0dc58bf64f09c7dfadb1/covizu/utils/seq_utils.py#L104 The overall memory usage of the list will be higher compared to just storing the characters in a continuous string. The difference in memory usage could be significant, especially when we have a large number of single-character strings stored in a list.

ArtPoon commented 1 year ago

Is this function covered by the unit test suite? Please feel free to optimize but make sure that the results are the same.

ArtPoon commented 1 year ago

Regarding commit 1b714b3, I think it is cleaner to generate a new string rather than re-use the old variable name and replace the previous string @SandeepThokala

ArtPoon commented 1 year ago

@SandeepThokala can you please post some timing results (with old and new code) on processing either the unit test, or a large set of data in case the test fixture is processed to quickly for meaningful timing results?

GopiGugan commented 1 year ago

@GopiGugan to provide @SandeepThokala with a larger dataset to generate the timing results

ArtPoon commented 11 months ago

@SandeepThokala can you please report timing and RAM results here?

SandeepThokala commented 11 months ago

Using sys.getsizeof() function to get memory occupied by the result object

len(refseq) = 29903 len(refseq) = 100000
old code new code old code new code
time taken 0.005 secs 0.001 secs 0.001 secs 0.01 secs
memory 239288 bytes 29952 bytes 8000056 bytes 1000049 bytes
ArtPoon commented 11 months ago

Thanks @SandeepThokala, go ahead and merge your changes into the dev branch please