AutoIDM / tap-clickup

tap-clickup , singer compliant tap for pulling clickup data
MIT License
12 stars 19 forks source link

Team, and Space filtering #143

Open Nevsksar opened 1 year ago

Nevsksar commented 1 year ago

I've been actively testing the ClickUp Tap for the past few days and have identified some opportunities for improvement, particularly for users managing large ClickUp environments with multiple workspaces. Some users, including ClickUp Managers for their companies, have access to numerous workspaces, resulting in access to hundreds of spaces, lists, folders, and projects. Currently, the tap, as it is now, retrieves data from all accessible workspaces when configured via the API key, which overloads the extraction process.

Enhancements Possibilities:

1 - Configurable Workspace Filtering: It would be highly beneficial to add a feature that allows users to configure which specific workspaces (team_ids) the tap should fetch data from. This customization would enable users to focus on extracting data only from the workspaces that are relevant to them.

2 - Custom Data Filtering: Additionally, incorporating the ability to configure which spaces, lists, folders, and projects to fetch data from would be great. These parameters are already available in ClickUp's API and could be integrated into the tap, some of these parameters are already hardcoded into streams.py. Inspiration for this feature can be drawn from the Airbyte variant, which supports custom data filtering.

I'm attempting a workaround by hardcoding some settings in the streams.py file and filtering downstream. However, due to the volume of spaces I have access to, this approach is inefficient and resource-intensive since it fetches all data before applying any filtering. Unfortunatelly my coding skills are null, otherwise I would develop these changes myself instead of trying to set up workarounds.

visch commented 12 months ago

@Nevsksar thanks for the issue!

Glad you're using the tap, sorry it isn't quite working for you.

Adding filters for workspaces, teams, etc is pretty straight forward. For example we could add something like

https://github.com/AutoIDM/tap-indeed/blob/main/tap_indeedsponsoredjobs/streams.py#L80-L87

I'd just propose that the configuration names be something like teams_inject and then accept a list of team_ids via configuration. That should address 1/2.

Inspiration for this feature can be drawn from the Airbyte variant, which supports custom data filtering.

  1. From looking at Airbyte's tap, there isn't a dynamic option which is really really painful for most of clickup's data model and for the users I've worked with for clickup

Questions I have for you

  1. How long does a run take for you? You say it "overloads the extraction process" what does that mean exactly?
  2. Happy to accept PR's you could copy from the example above and we'd provide feedback to get this into production for you :D
Nevsksar commented 11 months ago

Apologies for the late response I am new to much of this.

Regarding number 1 question: the pipeline would last for more thancouple hours and would sometimes fail due to an issue on a workspace or list being read. Thats because my API key has access to 5 workspaces and one of these workspace in an enterprise one so it has dozens of spaces , hundreds of lists and thousands of tasks to be fetch.

As per number 2. As we spoke in slack , I asked someone to help me out to do 2 PRs (hopefully i did everything ok) one of them is a bug fix regarding how booleans are returned by click up's API that was messing the custom values data. And the other the changes to implement said feature.

On these I have two issues: • 1 - First the way it is it does not allow setting different envs, which restricts the usage of the code, I cannot figure out to to correctly read from the meltano.yml file according to the env being used. • 2 - Could not find how to add data to meltano.yml file. At first i just edited the file (hardcoding the parameters) then i found a tap-clickup--autoidm.lock file inside meltanos installation that I figured out would allow me to set the data using CLI config -set ... but i do not know to to add this to the tap code or how to set it for different environments.

visch commented 11 months ago

Added comments to the PR you're going to need to resubmit that as I can't accept the code in that format

https://github.com/AutoIDM/tap-indeed/pull/35/files see this PR for the example I referenced above