QuotaReport from the python API takes a very (very) long time

NetApp / ontap-rest-python

This repository contains sample code illustrating how to access the ONTAP REST API using Python. This Repository also contains ONTAPI Usage reporting code that helps to identify ONTAPI usage in your environment using ONTAP REST APIs.

BSD 3-Clause "New" or "Revised" License

62 stars 41 forks source link

QuotaReport from the python API takes a very (very) long time #44

Closed orenshaniathuji closed 11 months ago

orenshaniathuji commented 1 year ago

Hi All,

I am using ontap-rest-python to query quota information from our NetAPP server.

I am basically just calling netapp_ontap.resources.QuotaReport.get_collection(type=type, max_records=500) with type=policy_usage and then type=users_quotas_users, and it takes about 0.1 seconds to read each quota record, which means that it will take me over an hour and a half to read the info for all our quotas.

Needless to say that the equivalent cli command, "volume quota report" takes only about 30 seconds to list all the quotas.

I googled up for similar issues and I can't find much. I did see somewhere that it was mentioned that the cause may be that the API actually queries the quota records one by one, so there are a lot of http requests being issues (and each as to be authenticated, etc.), but since I can no longer find that reference, I am not sure that this is the cause.

So is there something I can do to make the queries faster? Because otherwise this is really something you should fix

BR,

Oren

github-actions[bot] commented 1 year ago

Thank you for reporting an issue! If you haven't already joined our Discord community, then we invite you to do so. This is a great place to get help and ask questions from our community.

noorbuchi commented 1 year ago

Hello,

Thank you for pointing out this issue. We've noticed similar performance issues with QuotaReport and other resources in the library and we've made changes to significantly improve the performance of get_collection that will be coming in the next release.

My guess is that most of the time is spent loading and validating the schema of the records returned by ONTAP. This is a limitation of the marshmallow library that we use to enforce the object schema. And marshmallow can be quite slow. To verify that the issue is in fact from the library and not from ONTAP REST API, try making the same request using curl, it should give you the results much faster than netapp_ontap.

Until we officially publish the performance improvement, you can use this workaround:

make the request using Python's requests library
Iterate through the response JSON as a python dictionary and pick the records you're interested in
convert the record(s) that you want to a QuotaReport object using from_dict
Continue working with the objects as needed

I hope this helps out.

-Noor

orenshaniathuji commented 1 year ago

Hi Noor,

If ql is a QuotaRule object, and I just replace

ql.get()

with

requests.get('https://' + self.nthost + ql.instance_location, verify=False, auth=self.basic)

Then I still get more or less the same response time of ~ 0.1 seconds per request.

Did you mean that I should replace the QuotaRule.get_collection() call with a requests call? If so then how do I know which URL to fetch?

Many thanks,

Oren

On Mon, Sep 25, 2023 at 11:11 PM Noor Buchi @.***> wrote:

Hello,

Thank you for pointing out this issue. We've noticed similar performance issues with QuotaReport and other resources in the library and we've made changes to significantly improve the performance of get_collection that will be coming in the next release.

My guess is that most of the time is spent loading and validating the schema of the records returned by ONTAP. This is a limitation of the marshmallow library that we use to enforce the object schema. And marshmallow can be quite slow. To verify that the issue is in fact from the library and not from ONTAP REST API, try making the same request using curl, it should give you the results much faster than netapp_ontap.

Until we officially publish the performance improvement, you can use this workaround:

make the request using Python's requests library

Iterate through the response JSON as a python dictionary and pick the records you're interested in

convert the record(s) that you want to a QuotaReport object using from_dict

Continue working with the objects as needed

I hope this helps out.

-Noor

— Reply to this email directly, view it on GitHub https://github.com/NetApp/ontap-rest-python/issues/44#issuecomment-1734394799, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZMI7FLPAKQRBGM2JV5HQ73X4HQOHANCNFSM6AAAAAA4WH3WCQ . You are receiving this because you authored the thread.Message ID: @.***>

noorbuchi commented 1 year ago

Hello,

The point of attempting this workaround (using Python requests, or curl) is to test if the performance issue is caused by ONTAP REST API, or if it's caused by the library.

My guess is that it's related to library because it does a lot of expensive serialization and deserialization of the objects in the response.

You won't be able to tell much of a difference in performance when getting a single instance of QuotaReport with or without the workaround because it's quick to serialize/deserialize a single records. But in your case where you're getting a collection of 500 reports at a time, it becomes more expensive to use the library. You will also need multiple requests to get any remaining reports more than 500.

What I'm suggesting is to do a single request without using the library (and without specifying max_records if you want everything). This will result in a response containing all the records that match your query. And the body will be in JSON format. You can then iterate through the response as simple python dictionary and convert any record to a netapp_ontap object using from_dict().

If so then how do I know which URL to fetch?

I'm not sure I understand this question. If you're looking for the URL that corresponds to the QuotaReport object, you can find it at the resources documentation page (you have to scroll down a bit). It's /api/storage/quota/reports. And if you're looking for the URL of a single instance of the report, then our REST API documentation page has that information. It's /storage/quota/reports/{volume.uuid}/{index}.

I hope this helps!

-Noor

orenshaniathuji commented 1 year ago

Hi Noor,

So basically your'e telling me not to use the API at all, right?

Thanks

Oren

On Mon, Sep 25, 2023 at 11:11 PM Noor Buchi @.***> wrote:

Hello,

Thank you for pointing out this issue. We've noticed similar performance issues with QuotaReport and other resources in the library and we've made changes to significantly improve the performance of get_collection that will be coming in the next release.

My guess is that most of the time is spent loading and validating the schema of the records returned by ONTAP. This is a limitation of the marshmallow library that we use to enforce the object schema. And marshmallow can be quite slow. To verify that the issue is in fact from the library and not from ONTAP REST API, try making the same request using curl, it should give you the results much faster than netapp_ontap.

Until we officially publish the performance improvement, you can use this workaround:

make the request using Python's requests library

Iterate through the response JSON as a python dictionary and pick the records you're interested in

convert the record(s) that you want to a QuotaReport object using from_dict

Continue working with the objects as needed

I hope this helps out.

-Noor

— Reply to this email directly, view it on GitHub https://github.com/NetApp/ontap-rest-python/issues/44#issuecomment-1734394799, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZMI7FLPAKQRBGM2JV5HQ73X4HQOHANCNFSM6AAAAAA4WH3WCQ . You are receiving this because you authored the thread.Message ID: @.***>

noorbuchi commented 1 year ago

I'm suggesting that you use try using the REST API without the netapp_ontap library to get 500 records or more. This will help us determine if the performance issue is related to ONTAP itself or related to the library.

orenshaniathuji commented 11 months ago

Problem solved: I used the following command to fetch all the requested data quickly:

qurl = resource.get_collection_url(self.hc) + '?type=' + type + '&max_records=10000&fields=*'
query = requests.get(qurl, verify=False, auth=self.basic)

Where "resource" is either QuotaRule or QuotaReport