"online workshop", I need help; what is dataset? how can I make one?

fastcloud-cho commented 2 years ago

Hello, this is Dr. Cho at S-Korea. Recently I've purchased an access to your "online workshop", I use Python (Spyder) n Mac. As a novice at Python and Matlab, I am experiencing frustrations.. Above all, I need help understanding the first part of each script for the workshop, "#(0)Load dataset".

where is the files for "dataset" on my Mac? what does dataset means? how can I open it and see numbers inside?
can I have some introduction about Python command "dataset", "yA,yB = dataset.get_data()" ?
I would like to apply spm1d principles to my EMG data files (.xlsx or .csv). How should I prepare my files and load it for spmd1d on Python ? Should I use Pandas?

I think "online workshop" need some consideration for the novices like me..

jiku-pro commented 2 years ago

where is the files for "dataset" on my Mac?

Data files are saved in a folder called datafiles.
You can see the contents of the datafiles folder online at: https://github.com/0todd0000/spm1d/tree/master/spm1d/data/datafiles
The location on your computer is ./spm1d/data/datafiles where ./spm1d is the location of the main spm1d folder.
You can find the location of the spm1d folder in Python using the following commands:

import spm1d
print(spm1d)

The print command above will likely return a result like this:

/Users/USERNAME/opt/anaconda3/lib/python3.9/site-packages/spm1d/__init__.py'

This means that the ./spm1d location is: /Users/USERNAME/opt/anaconda3/lib/python3.9/site-packages/spm1d

what does dataset means?

A "dataset" object in spmd is a programming tool that gives the user a common, consistent interface to a variety of datasets.
Datasets downloaded from different locations around the internet tend to be stored as different file types (e.g., .csv, .mat, .xlsx, etc.) and also with inconsistent internal formats (e.g. time as table rows vs. time as table columns); "dataset" objects in spm1d bring consistency to various third-party datasets

how can I open it and see numbers inside?

You can load the .npy and .npz files in the dataset folder using np.load
For example:

import numpy as np

filename = '/Users/USERNAME/opt/anaconda3/lib/python3.9/site-packages/spm1d/data/datafiles/ex_grf_speeds.npy'
a = np.load(filename)
print( a.shape )

(60, 10)

this means that the ex_grf_speeds.npy file contains a 60 x 10 (2D) array.

can I have some introduction about Python command "dataset", "yA,yB = dataset.get_data()" ?

Attached to the dataset object you will find a variety of variables and functions
In Python, variables and functions are termed "attributes" and "methods", respectively, when they are attached to an object.
To see a list of all attributes and methods type: print( dir( dataset ) ); this will return something like this:

['A', 'STAT', 'Y', 'YA', 'YB', ... 'cite', 'datafile', 'design', 'df', 'dim', 'get_data', ...]

get_data is a method; it is a function that returns the variables needed for the example statistical test
In the case of yA,yB = dataset.get_data(), two variables (yA and yB) are needed for the test; the test is likely a two-sample t test or a paired t test
These variables are actually just the YA and YB attributes
Therefore the following two sets of commands are equivalent:

yA,yB = dataset.get_data()

yA = dataset.YA
yB = dataset.YB

To summarize, the get_data method allows users to access all relevant variables with a single command, and this single-command access is consistent across all of spm1d's datasets, regardless of the experimental design and the data dimensionality

I would like to apply spm1d principles to my EMG data files (.xlsx or .csv). How should I prepare my files and load it for spmd1d on Python ? Should I use Pandas?

We suggest creating .csv files as follows:
- Rows: observations
- Columns: time points
- No headers or footers
- No empty rows or columns
- Then you will be able to load the data using np.loadtxt( filename, delimiter=',')
Yes, you can use Pandas. You can also use any other data loading functionality including:
- .csv reading: np.loadtxt
- .xlsx reading: we suggest using openpyxl;
We strongly suggest using .csv and not .xlsx; it is much easier to load .csv data

I think "online workshop" need some consideration for the novices like me..

We are very sorry to hear that you are not satisfied with spm1d's novice content
If you would like to request a refund please contact workshops@spm1d.org
We are happy to provide novice support in these online forums
There are many free resources available for learning Python fundamentals; one example is the open-source textbook: Data Analysis Practice In Python and Jupyter; there you can learn important fundamental Python skills like data reading and writing, working with arrays, creating figures, etc.

fastcloud-cho commented 2 years ago

I appreciate "Jiku-pro"'s kind and prompt reply. I'm happy about your "novice support". Two thumbs up! BTW, could you tell me how can I get a normal pdf copy of the open-source textbook "Data Analysis Practice In Python and Jupyter"?

0todd0000 commented 2 years ago

Thank you for the feedback!

There is unfortunately no PDF copy of the open-source textbook. All content is provided as a Jupyter notebooks (.ipynb files), for example: this section on functions and the lessons are meant to be standalone modules, without cross-referencing other lessons. It is indeed possible to convert these notebooks to PDF format, but after you start working with the notebooks you will likely find that the notebook format is quite nice to work with.

As one option, you can download all source code. Then you can open the HTML files individually for each section, like this one: ./Lessons/Lesson01/1-IntroductionToJupyter/IntroductionToJupyter.html.

0todd0000 / spm1d

"online workshop", I need help; what is dataset? how can I make one? #215