Kitware / UPennContrast

UPenn ?
https://upenn-contrast.netlify.com/
Apache License 2.0
8 stars 6 forks source link

Upload multiple property endpoint is slow #670

Closed arjunrajlab closed 3 months ago

arjunrajlab commented 4 months ago

When using this command:

workerClient.add_multiple_annotation_property_values(dataset_property_value_dict)

It seems to be very slow. The calculation of the properties is now super fast, which makes sense because we do them all at once, but the upload to the server is quite slow. It could be that this reformatting is slow:

reformatedValues = []
        propertyId = self.propertyId
        for datasetId in values:
            for annotationId in values[datasetId]:
                reformatedValues.append({
                    "datasetId": datasetId,
                    "annotationId": annotationId,
                    "values": {
                        propertyId: values[datasetId][annotationId]
                    },
                })

But I suspect it is this call that is slow:

self.annotationClient.addMultipleAnnotationPropertyValues(reformatedValues)

I talked with @manthey about it, and he suggested that it may be a problem with the database setting user access for each property entry. Or perhaps something else in that loop. But again, not sure.

bruyeret commented 3 months ago

Command for profiling with pprofile:

python -m pprofile -o /tmp/profile.dat girder serve --host 0.0.0.0

Profile mongo, go to the mongo db docker image and:

exec mongosh
use girder
db.setProfilingLevel(2)
# Do what takes time here #
db.setProfilingLevel(0)
db.system.profile.find().limit(2).sort( { millis : -1 } ).pretty()
db.system.profile.drop()

Don't forget to db.setProfilingLevel(0) to avoid filling the space with logs