Closed alxndrkalinin closed 7 years ago
@alxndrkalinin So if the user imports more than 2000/column, app should limit the number of rows to satisfy - rows*columns == 2000
. Did I get that right?
Also, Is there any mathematical technique for sub sampling or just Math.random
?
@chartotu19, sorry, typo, it's 20.000
I was talking about 20.000/#cols as a way to figure out what would be a default value for how many rows by default to request from WorldBank. E.g. if Education Spending category has 10 columns, than you show 2.000 rows as a default. However it's okay to let user to request more (up to max allowed by API).
Also, subsampling is already implemented – e.g., if you try loading Knee Pain dataset it'll suggest subsampling
the fix for missing header is pushed now. PR is ready to be merged feature-wise
We need to fix WordBank API and add few more options for dataset. So far I was using 20.000 cells as a soft limit for dataset size to allow clustering/bootstrapping – so number of record to retrieve can be calculated as 20.000/#cols. However, user can get more records and then SOCRAT will suggest to uniformly subsample rows.
Subtasks: