adsabs / export_service

Export service to output ADS records with various formats including BibTex, AASTex, and multiple tagged and xml options
MIT License
3 stars 5 forks source link

some performance improvement suggestions #220

Open romanchyla opened 2 years ago

romanchyla commented 2 years ago

memory optimization

given the memory problems we have detected, one possible optimization is to remove a document once it has been processed

https://github.com/adsabs/export_service/blob/9eab9377c4ab630340a9c021ff040e7d6c15bd2b/exportsrv/formatter/fieldedFormat.py#L621

other optimizations

this part in here is quadratic, it is also making python list work extra hard by doing list.pop() -- cause python will have to reshuffle the lists

https://github.com/adsabs/export_service/blob/master/exportsrv/utils.py#L92

for better results:

1. turn the docs into a dict d
2. then do:
  for bibcode in bibcodes:
    if bibcode in d:
       new_docs.append(d.pop(bibcode))

this is another quadratic issue (and all of the similar)

in Python, a string is copied every time += is used -- which is problematic in here because export is building large textual output; so it gets more expensive with every added string

https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L262 https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L522

better to keep appending to a list; and then return ''.join(list)