SOCR / SOCRAT-issues

Project management for SOCRAT
2 stars 0 forks source link

fix WorldBank API #150

Closed alxndrkalinin closed 7 years ago

alxndrkalinin commented 7 years ago

We need to fix WordBank API and add few more options for dataset. So far I was using 20.000 cells as a soft limit for dataset size to allow clustering/bootstrapping – so number of record to retrieve can be calculated as 20.000/#cols. However, user can get more records and then SOCRAT will suggest to uniformly subsample rows.

Subtasks:

chartotu19 commented 7 years ago

@alxndrkalinin So if the user imports more than 2000/column, app should limit the number of rows to satisfy - rows*columns == 2000. Did I get that right? Also, Is there any mathematical technique for sub sampling or just Math.random?

alxndrkalinin commented 7 years ago

@chartotu19, sorry, typo, it's 20.000

I was talking about 20.000/#cols as a way to figure out what would be a default value for how many rows by default to request from WorldBank. E.g. if Education Spending category has 10 columns, than you show 2.000 rows as a default. However it's okay to let user to request more (up to max allowed by API).

Also, subsampling is already implemented – e.g., if you try loading Knee Pain dataset it'll suggest subsampling

screen shot 2017-03-07 at 10 01 17 pm

chartotu19 commented 7 years ago

the fix for missing header is pushed now. PR is ready to be merged feature-wise