Noblis / INVSC-janus

IARPA Janus Program API
Other
9 stars 16 forks source link

memory issues (janus_create_gallery and janus_serialize_gallery) #30

Closed stephenrawls closed 8 years ago

stephenrawls commented 8 years ago

Hi,

I have another concern about memory consumption. Again, this might not be an issue for the evaluation (because NIST has large machines), but it seems to me like it is an issue for future government customers, especially given the large number of subjects we eventually hope to address by phase 3.

Several current API calls require that a gallery is copied into memory twice, thus doubling the total memory requirements of the program. This happens in two places: 1) The janus_serialize_gallery() and janus_deserialize_gallery() functions. This could be fixed by having these functions write/read directly to files. 2) The janus_create_gallery() function. This one is bit harder to fix under the current API design.

At the last PI meeting, we briefly talked about the competing interests of optimizing an API for evaluation VS optimizing an API for actual users, so maybe the answer here is that we simply don't care about the memory usage right now and this API is only meant for evaluation.

Just thought I'd see what people thought.

Thanks, Stephen Rawls (ISI)

JordanCheney commented 8 years ago

Hi Stephen,

One of the goals for this iteration of the API was to not have performers perform any I/O directly. This was intended to facilitate the use case where users don't want and/or can't write directly to a file but might instead want to send the data to a database or do something else. I agree with you that this doubles our memory usage for galleries and I'm open to thoughts on a better way to facilitate this.

stephenrawls commented 8 years ago

Hi Jordan,

For serialization: In C++ I'd just say use a bytestream. In C I don't know the best solution. Perhaps have multiple API entrypoints, e.g. janus_serialize_file(), janus_serialize_socket(), etc. Perhaps use non-standard C, e.g. this POSIX function that can turn a piece of memory into a file pointer: http://pubs.opengroup.org/onlinepubs/9699919799/functions/fmemopen.html

For janus_create_gallery() I guess this one can be solved with the existing API. Just warn users up-front that calling the janus_create_gallery() function will end up doubling memory requirements for all the templates they pass in. If they don't have enough RAM to handle that, then they can instead just iteratively call janus_gallery_insert() while being careful about only loading in enough templates to stay under half their system RAM.

Ideally I think it would save effort on the client (in terms of managing memory), and also effort on the implementation (in terms of a lot of extra memcpy's) if we just had a janus_gallery_insert_and_create_template() function that handled both template creation and gallery insertion at once, although I understand that this would limit the ability to do work on multiple machines under the current API.

Thanks, Stephen Rawls (ISI)

JordanCheney commented 8 years ago

Hi Stephen,

I tried to address this in PR #38. Please let me know if you have comments or questions

stephenrawls commented 8 years ago

I see, I guess I missed that in my first pass over the API.

It's not a huge issue for the evaluation, but we will need to know which compiler and compiler version that NIST will be using. (Presumably gcc 4.x?)

If this API were going to be used by other govt. customers than it might be more of an issue.

JordanCheney commented 8 years ago

NIST will provide all hardware and software specs for the evaluation machine to all of the performers

JordanCheney commented 8 years ago

Merged PR #38