AdamaJava / adamajava

Other
14 stars 5 forks source link

Qsig remove tokenising #272

Closed holmeso closed 3 years ago

holmeso commented 3 years ago

Description

A couple of changes to support qsignature's foray into the world of different (larger) positions files. First, some instances where the TabTokenizer class was being used unnecessarily have been replaced with a less cpu intensive process (according to jvisualise)

And secondly, a new option (maxCacheSize) has been introduced that is used by the Compare class. This option allows the user to specify the number of qsignature vcf files that should be stored in cache for the comparison. Traditionally, all files would be stored in cache and then the comparisons performed, but when positions files are large, and therefore qsignature vcf files are also large, this method has the potential to cause OutOfMemory errors.

The downside to limiting the cache size is that it means that some files will be loaded more than once, pushing out the runtime as well as increasing the I/O impact.

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

Unit tests have been expended. Updated code has been run and compared against existing code with identical results.

Are WDL Updates Required?

No wdl updates are required. The default behaviour of the new option is to proceed as it would have previously.

Checklist: