2DegreesInvesting / tiltDataPipelines

MIT License
0 stars 0 forks source link

Generating toydata for tiltIndicator #112

Open ysherstyuk opened 7 months ago

ysherstyuk commented 7 months ago

Hi @maurolepore,

As agreed on the email yesterday, we will be working on generating toydata, which will be included to the data pipelines so sample data can be generated from the latest data taken from the storage whenever there is a new release of the package.

I would like to ask you to provide some input on the below.

We need data requirements and data specifications for tiltIndicator and since you already prior developed toydata, can you share some insights on your methodology of selecting sample data (e.g., in selecting companies so that they can be joined with other datasets etc.,)

Please include the above to this ticket so it's easier to trace. And let me know if you need clarification.

Thanks!

maurolepore commented 7 months ago

Hi @ysherstyuk,

We need data requirements and data specifications for tiltIndicator

tiltIndicator is tested with toy datasets in the tiltToyData package:

While small, those datasets are an overkill for most tests, so tiltIndicator defines even smaller and less realistic toy datasets that are just enough to test specific behaviours. You can see an overview of them all here, and you may want to search for "example_" in the tiltIndicator repo.

Since you already prior developed toydata, can you share some insights on your methodology of selecting sample data (e.g., in selecting companies so that they can be joined with other datasets etc.,)

I did not develop the datasets in tiltToyData. @Tilmon and @kalashsinghal did. You may ask them what their strategy was. I only moved them from other places to the tiltToyData.

However, I'm about to create -- for the first time -- some datasets for tiltToyData and the conversation is happening at https://github.com/2DegreesInvesting/tiltToyDataPrivate/pull/1 -- which is private because we're discussing licensed data. I think my strategy will be based on the article "Creating realistic data" with the charlatan package or similar art.

ysherstyuk commented 7 months ago

Hi @maurolepore ,

Thanks for the above information! But I thought we agreed that we will be creating toydata for tiltIndicator. Why are you also creating toydata?

Also I don't have access to the private repo.

maurolepore commented 7 months ago

@ysherstyuk

Do you mean why I'm working on https://github.com/2DegreesInvesting/tiltToyDataPrivate/pull/1?

The thread that resulted in that PR started over 3 weeks ago. You can enjoy the public version of the long conversation here :-). That task was assigned to me but I would happily pass it on to you if you want. You'll need to talk to @AnneSchoenauer though -- as she started it here.

You may also ask @AnneSchoenauer or @Tilmon about your access to licensed data. I don't know who is in the license and who is not -- so better check with them.