1. turn the docs into a dict d
2. then do:
for bibcode in bibcodes:
if bibcode in d:
new_docs.append(d.pop(bibcode))
this is another quadratic issue (and all of the similar)
in Python, a string is copied every time += is used -- which is problematic in here because export is building large textual output; so it gets more expensive with every added string
memory optimization
given the memory problems we have detected, one possible optimization is to remove a document once it has been processed
https://github.com/adsabs/export_service/blob/9eab9377c4ab630340a9c021ff040e7d6c15bd2b/exportsrv/formatter/fieldedFormat.py#L621
other optimizations
this part in here is quadratic, it is also making python list work extra hard by doing
list.pop()
-- cause python will have to reshuffle the listshttps://github.com/adsabs/export_service/blob/master/exportsrv/utils.py#L92
for better results:
this is another quadratic issue (and all of the similar)
in Python, a string is copied every time
+=
is used -- which is problematic in here because export is building large textual output; so it gets more expensive with every added stringhttps://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L262 https://github.com/adsabs/export_service/blob/master/exportsrv/formatter/bibTexFormat.py#L522
better to keep appending to a list; and then return
''.join(list)