AutoIDM / tap-clickup

tap-clickup , singer compliant tap for pulling clickup data
MIT License
12 stars 20 forks source link

[Feature] Support for TimeEntries parameters #134

Open sayansha opened 2 years ago

sayansha commented 2 years ago

Thanks for providing the tap. However, the current TimeEntries stream only returns the time tracking data of the calling user for the past 30 days. According to my data warehousing use case, I need the time-tracking data for all users for a longer period. To get all (older than 30 days) time tracking data for all users the parameters assignee and start_date are required. Also, the include_location_names is necessary for getting additional details. Currently, I am doing the following to get the data,

name = "time_entries"
path = "/team/{team_id}/time_entries?start_date=1631743200000&include_location_names=true&assignee={userIds}"

Is it possible to set the parameters mentioned above via the configuration? If not, it would be great to have a feature for the TimeEntries stream, where the parameters mentioned above can be passed through the configuration.

visch commented 2 years ago

Thanks for the issue @sayansha! I've updated the docs for the Time Entries stream in this PR https://github.com/AutoIDM/tap-clickup/pull/135

Yes your method looks better as it'd get all time entry history as you've said.

To implement this I think we'd need to look at a few things (After verifying we can even pull this data as expected via Postman, sometimes Clickups docs don't match exactly what you'd expect to happen)

  1. How do we get {userIds} to the time_entries call

    1. With https://github.com/AutoIDM/tap-clickup/blob/main/tap_clickup/schemas/team.json#L49 membership looks to be available in the teams parent so we should be able to give this data to child streams of team
    2. Extend this code to return a ["userIds"] list in the dict
    3. In time_entries copy the get_url_parmeters function here https://github.com/AutoIDM/tap-clickup/blob/main/tap_clickup/streams.py#L241-L252 and modify it to include just the needed assignees.
  2. Pagination on the time_entries endpoint

    1. (I'd assume it'd paginated, this is pretty easy copy https://github.com/AutoIDM/tap-clickup/blob/main/tap_clickup/streams.py#L271 and do if recordcount == 0: newtoken = previous_token + 1 else: None
  3. start_date we probably want this to be incremental as I'd assume time entries has a lot of data for most folks.

    1. We could copy the Tasks stream's implementation where we can. That will get us the starting_start_date
    2. We'll need to dive into the Clickup API docs to see which field we should use to track as a replication_key for the teams stream and how we should adjust the get_url_parameters to track this properly.

    Is it possible to set the parameters mentioned above via the configuration?

Yes, it's very possible see my above writeup! number 1

If not, it would be great to have a feature for the TimeEntries stream, where the parameters mentioned above can be passed through the configuration.

The issue is in start_date as we'd have to pick a proper start date here, and I think there's too much data in this endpoint to ask people to pull all time_entries every run. So we should implement incremental runs for this which is number 3 in my write up.

I'm welcome to PRs if you have the time @sayansha !