box / box-python-sdk

Box SDK for Python
http://opensource.box.com/box-python-sdk/
Apache License 2.0
418 stars 215 forks source link

trash().get_items does not support user_marker=True #776

Closed emboehm closed 1 year ago

emboehm commented 1 year ago

I noticed that when I tried to use

trash_items = client.trash().get_items(use_marker=True)

I would get the following error

Traceback (most recent call last):
  File "./myapp_marker.py", line 21, in <module>
    trash_items = client.trash().get_items(use_marker=True)
  File "/work/box-python-sdk/boxsdk/util/api_call_decorator.py", line 63, in call
    return method(*args, **kwargs)
TypeError: get_items() got an unexpected keyword argument 'use_marker'

I verified that the API supports use_marker=True for trash https://developer.box.com/reference/get-folders-trash-items/

Steps to Reproduce

  1. Create a box client object but authorizing to Box. I used OAuth2 for my example.
  2. Try to run this code 'trash_items = client.trash().get_items(use_marker=True)'

Expected Behavior

I expected to receive a list of items from the trash.

Error Message, Including Stack Trace

Traceback (most recent call last):
  File "./myapp_marker.py", line 21, in <module>
    trash_items = client.trash().get_items(use_marker=True)
  File "/work/box-python-sdk/boxsdk/util/api_call_decorator.py", line 63, in call
    return method(*args, **kwargs)
TypeError: get_items() got an unexpected keyword argument 'use_marker'

Possible fix

I looked at the code in https://box-python-sdk.readthedocs.io/en/latest/_modules/boxsdk/object/trash.html#Trash.get_items and compared it with the code for folders, that correctly implements use_marker=True https://box-python-sdk.readthedocs.io/en/latest/_modules/boxsdk/object/folder.html#Folder.get_items

I identified the missing code and added the changes to boxsdk/object/trash.py.

diff --git a/boxsdk/object/trash.py b/boxsdk/object/trash.py
index 0f421c3..fe8363a 100644
--- a/boxsdk/object/trash.py
+++ b/boxsdk/object/trash.py
@@ -4,6 +4,7 @@
 from .base_endpoint import BaseEndpoint

 from ..pagination.limit_offset_based_object_collection import LimitOffsetBased
+from ..pagination.marker_based_object_collection import MarkerBasedObjectColle
 from ..util.api_call_decorator import api_call

 if TYPE_CHECKING:
@@ -91,7 +92,12 @@ def permanently_delete_item(self, item: 'BaseItem') -> bool:
         return box_response.ok

     @api_call
-    def get_items(self, limit: Optional[int] = None, offset: Optional[int] = N
+    def get_items(self,
+                  limit: Optional[int] = None,
+                  offset: Optional[int] = None,
+                  marker: Optional[str] = None,
+                  use_marker: bool = False,
+                  fields: Iterable[str] = None) -> 'BoxObjectCollection':
         """
         Using limit-offset paging, get the files, folders and web links that a

@@ -99,11 +105,29 @@ def get_items(self, limit: Optional[int] = None, offset: O
             The maximum number of entries to return per page. If not specified
         :param offset:
             The offset of the item at which to begin the response.
+        :param marker:
+            The paging marker to start returning items from when using marker-
+        :param use_marker:
+            Whether to use marker-based paging instead of offset-based paging,
         :param fields:
             List of fields to request.
         :returns:
             An iterator of the entries in the trash
         """
+
+        additional_params = {}
+        if use_marker:
+            additional_params['usemarker'] = True
+            return MarkerBasedObjectCollection(
+                session=self._session,
+                url=self._session.get_url('folders', 'trash', 'items'),
+                limit=limit,
+                marker=marker,
+                fields=fields,
+                additional_params=additional_params,
+                return_full_pages=False,
+            )
+
         return LimitOffsetBasedObjectCollection(
             session=self._session,
             url=self._session.get_url('folders', 'trash', 'items'),

I was able to list items from the trash using marker based pagination after making these changes.

Versions Used

Python SDK: 3.5.0 Python: 3.7.1

arjankowski commented 1 year ago

Hi @emboehm, thanks for reporting this to us!

May I ask why you need use marker paging instead of limit-offset?

As you can see, limit-offset paging is fully supported and you can read about it here .

When it comes to add support to user-marker paging, I've already created a ticket SDK-2810 for this.

We will discuss the plan and timeline for this feature with the SDK team.

Regards, Artur

emboehm commented 1 year ago

Hi Artur,

I didn't see your ticket SDK-2810. I tried a search for the issue with author:arjankowski type:ticket. Could you provide a pointer to the ticket. I have a question that might be answered in the ticket.

I am using marker paging instead of limit-offset because limit-offset wasn't working. It would just hang.

We have an enterprise account with hundreds of thousands of items in the trash. We have opened several tickets with Box on the problems we are having because of this, without a resolution.

100K+ items makes it impractical to clean up the trash -- permanently delete items -- through the GUI. That is why I am resorting to using the sdk to permanently delete items.

I cannot get a listing of the trash in a browser without using /trash?usemarker=true.

I tried to get_items with the sdk but that would hang for several minutes before I killed it. That is when I went looking for the option to use use_marker=True, based on my experience with the browser.

The marker pagination also seems to be significantly faster than limit-offset.

This is marker pagination:

Loop iterations:   1000, Elapsed time: 0:00:08
Total iterations:  1000, Total time:   0:00:08
Current time: 2022-12-08T05:12:35

This is limit-offset pagination:

Loop iterations:   1000, Elapsed time: 0:01:26
Total iterations:  1000, Total time:   0:01:26
Current time: 2022-12-08T04:58:57

marker pagination is about 10x faster (8 seconds vs 86 seconds)

-- Eric

arjankowski commented 1 year ago

Hi @emboehm, I created this ticket internally on our Jira, that's why you don't see it. But when it comes to adding a 'marker', we planned the works for the beginning of January.

Regards, Artur

emboehm commented 1 year ago

Artur,

My question then is this: why are there two separate implementations for get_item/get_items depending on whether it is a folder or Trash?

It seems to me like the code is duplicated, creating a bit of a maintenance headache. Updates/fixes to folder.get_item(s) don't necessarily make it to trash.get_item(s).

arjankowski commented 1 year ago

Hi @emboehm,

Thank you for your feedback, we will certainly take it into account when implementing this change.

Folders and Trash APIs are separate services, operating on a different domain, which have separate implementations in the backend and a different direction of development. The fact that at the moment they have the same API may change.

Therefore, in our python implementation, they do not have a common implementation, but instead use common generic components such as MarkerBasedObjectCollection or LimitOffsetBasedObjectCollection.

Regards, Artur

emboehm commented 1 year ago

Hi Artur,

I appreciate the explanation. Trash looked like just another instance of folder at first glance. It makes sense that the implementations are separate, given that they are separate services.

-- Eric

congminh1254 commented 1 year ago

Good morning @emboehm,

As this issue already fixed in the PR #781, and it will be ready in new the Box SDK version which will release in upcoming weeks. We're closing this issue for now, if you have any other problem, feel free to open it again.

Regards, Minh

lukaszsocha2 commented 1 year ago

Hi @emboehm, the feature is now ready to use in new Python boxsdk 3.6.0 release. Best, @lukaszsocha2

emboehm commented 1 year ago

Hi @lukaszsocha2

I pulled the latest release and it is working fine for me.

Thanks, @emboehm