AutoViML / AutoViz

Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Apache License 2.0
1.7k stars 196 forks source link

Statistically valid sample #82

Closed borisRa closed 1 year ago

borisRa commented 1 year ago

Hi,

I understood tat AutoViz should run statistically valid sampling ( by setting 'max_rows_analyzed')

I have found this sample here :

dfte = dfte.sample(nrows, replace=False, random_state=99)
print('        randomly sampled %d rows from read CSV file' %nrows)

Which is a usual random sample (and not statistically valid)

Can you please elaborate where I can find the statistically valid sampling code ?

Thanks, Boris

AutoViML commented 1 year ago

Hi @borisRa Can you please explain to me what you are expecting when you say "statistically valid sample"? Are you referring to "stratified sampling" ? In that case, I am not doing that. I have used random sampling and if needed, I can change it to stratified sampling. Let me know. AutoVimal