Orange-OpenSource / documentare-simdoc

New Developments are now done on Gitlab.com: https://gitlab.com/Orange-OpenSource/documentare/documentare-simdoc . Library and tools for similarity measurement, classification and clustering of digital content and segmentation images from digitized document
GNU General Public License v2.0
3 stars 1 forks source link

Memory management with huge data #53

Closed JoelGardes closed 7 years ago

JoelGardes commented 7 years ago

There is perhaps a problem with memory management: 1) even with largest memory (managed with -Xmx java option), java heap error on large directories containing raw pictures (1500 files of 1,8 MB jpeg files). 2) ncd process can work at 100% memory in some cases, without java heap error, why?

(directory "/Claudia/NewYork" on both servers z620 and z820 under home directory of jyig5563)

JoelGardes commented 7 years ago

/*Possible suggestion : 1) splitting input directories depending disponible memory size in prep-data. 2) computing NCD with options (-j1 dirname and -j2 dirname) or (-d1 dirname and -d2 dirname) for each couple of directories (but no need to apply symetry because [d1 and d2] = [d2 and d1] 3) merging json files at the end of ncd

Question : what about thumbnails ? (to be discussed)*/ (obsolete)

Developing a distributed process strategy

JoelGardes commented 7 years ago

Obsolete