elegant-scipy / elegant-scipy

1st Edition of Elegant SciPy (O'Reilly Publishers)
Other
553 stars 209 forks source link

file paths within notebook cells #319

Open al-dann opened 7 years ago

al-dann commented 7 years ago

Reading the chapter 1 (may be in others as well), I come across some cells, where external files are read.

For example:

# Import gene lengths
filename = 'data/genes.csv'
with open(filename, 'rt') as f:
    # Parse file with pandas, index by GeneSymbol
    gene_info = pd.read_csv(f, index_col=0)
print(gene_info.iloc[:5, :])

The output is like this:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-24-0459b48eead6> in <module>()
      1 # Import gene lengths
      2 filename = 'data/genes.csv'
----> 3 with open(filename, 'rt') as f:

that happens because the notebook file 'ch1.ipynb' is located not in the 'default' directory, but in the 'ipynb' subdirectory.

If the file path is corrected to

# Import gene lengths
filename = '../data/genes.csv'

(adding '../' in front of the 'data/genes.csv') the cell is executed correctly...

Similar issues might happen with styles - i.e.:

# Use our own style file for the plots
import matplotlib.pyplot as plt
plt.style.use('style/elegant.mplstyle')

if we would like to modify the notebooks - a few things to think about: 1) how to handle a difference (if any) between a printed book and code examples in notebooks 2) how to handle a difference (if any) between a complete book building and individual chapter building 3) are there any other chapters with such clauses where similar modifications are required.

jni commented 7 years ago

Hi! The issue is complicated. I tried to get @stefanv to agree to put the notebooks in the root directory, but he thought that would be messy. Maybe now he will change his mind. ;) in the meantime, we specifically made github.com/elegant-scipy/notebooks to avoid this issue — perhaps try downloading that repo instead? Sorry for the confusion, fix upcoming!

al-dann commented 7 years ago

Thank you for the link to a neighbour repository.

Well, probably a proper solution/decision depends on what to consider as a book 'primary basement' (I don't know a proper name/sign for that idea/thing)... Should it be a (1) printed book (or correspondent pdf file), or a (2) git based structure for jupyter notebooks... and what happens when any mistakes are found (and how they corrected if this option is supported).

In my particular case I don't mind to correct everything in place and play with cells according to my ideas and experiments, so I don't see this as a big issue at all... but there may be other readers.

jni commented 7 years ago

Oh I don't think we want our readers spending their time debugging working directory issues instead of learning about SciPy, so we will definitely look for a proper solution, so that using the notebooks is reliable. In the meantime I want to make sure things are working for you!

stefanv commented 7 years ago

One option: have the first cell be:

# Change into the repository root directory, so that all
# files are accessible at the correct locations.
import os
if not os.path.exists('.git') and os.path.exists('../.git'):
    os.chdir('..')
jni commented 7 years ago

That won't work if people click on "Download ZIP" instead of cloning.

How about if os.getcwd().endswith('ipynb') and os.path.exists('../data'):?

Having said that, I think this is super hacky and an ugly way to start each and every chapter. Note that, without some further hacking, this will be printed in the book as well.

Another way would be for the build commands in the Makefile to add links,

ln -s data ipynb/data
ln -s style ipynb/style

I think I eschewed this originally because it's not cross-platform, but the Makefile won't run on Windows anyway (which is its own problem).

stefanv commented 7 years ago

Since we don't do any calling of scripts and other gymnastics, I think that solution is fine.