ariddell / tatom

Quantitative Text Analysis for the digitale Geisteswissenschaften
https://de.dariah.eu/tatom/
47 stars 18 forks source link

Add discussion of pandas (in lieu of numpy?) #2

Open ariddell opened 10 years ago

ariddell commented 10 years ago

Pandas does make many operations much easier. Need to find sensible ways of integrating mentions of its uses. In principle, I think the tutorials should only require familiarity with the "basic" numpy/scipy stack.

ariddell commented 10 years ago

Anyone who is interested in this tutorial is probably ready for pandas. Upon reflection, I don't think it adds too much complexity.

christofs commented 10 years ago

I also think including pandas is a good idea. DataFrames and label-based slicing of them is very useful in our context and actually makes things a lot more intuitive.

ariddell commented 10 years ago

How should I weave it in? Should there be a separate tutorial showing slicing by label etc?

christofs commented 10 years ago

Not sure about this. Maybe a brief section on dealing with tabular data being output by Mallet or NMF could be added somewhere and then referenced in the various relevant places: reading such data into a pandas dataFrame, slicing by label, etc. It is kind of a bridge between TM/NMF and the visualization part, so maybe it could fit at the beginning of the Visualization chapter, as well.