floydhub / floyd-docs

FloydHub's documentation code. Contributions welcome!
https://docs.floydhub.com/
63 stars 65 forks source link

Confusing on how to use mounted dataset #53

Closed ankitPagalGuy closed 7 years ago

ankitPagalGuy commented 7 years ago

I read docs here http://docs.floydhub.com/guides/data/mounting_data/, I have already uploaded dataset on floydhub . I mounted this dataset successfully using floyd run --data ankit/datasets/cnn_training_set/1 --env keras "python cnn.py"

what I am not able to understand is what should I give path for dataset in my code , I tried various things such as training_set = train_datagen.flow_from_directory( 'ankit/datasets/cnn_training_set/1'... training_set = train_datagen.flow_from_directory( 'datasets/cnn_training_set/1'... training_set = train_datagen.flow_from_directory( 'cnn_training_set/1'...

but every time it gave error FileNotFoundError: [Errno 2] No such file or directory: , what am I doing error ? anything I am missing ?

saiprashanths commented 7 years ago

As the docs mention, the syntax is --data <data_name>:<mount_point>.

If you provide --data ankit/datasets/cnn_training_set/1:my_data, your data will be mounted at /my_data.

ankitPagalGuy commented 7 years ago

Thank you for help , is there any way I can fetch data from sub folder , like in this datasethttps://www.floydhub.com/swaroopgrs/datasets/dogscats/1 , there is a folder named sample , if I have to access my_data/sample/train is it possible ? or there is another way . Thank you

saiprashanths commented 7 years ago

If you mount your data using --data ankit/datasets/cnn_training_set/1:my_data, yes, just can just access it at /my_data/sample/train (note the / in the beginning).

For more advanced cases (e.g. you want your data to be available at /ankit/a/b/c/d), please see this guide on symlinking: http://docs.floydhub.com/guides/data/symlink_mounted_data/

mckayward commented 7 years ago

@saiprashanths Halfway through writing this when you posted :blush: Posting anyways in case it helps anybody down the road.

@ankitPagalGuy When you run your job using floyd run and pass the --data flag like this:

floyd run --data ankit/datasets/cnn_training_set/1:my_data "python cnn.py"

your code will be sent up to FloydHub and run on a computer that has your data located on its filesystem at /my_data.

Compare this to if the data were on your computer, and you were running the code locally. You might have the data in a folder located somewhere like /home/ankit/datasets/dogscats. When running the code on your computer, your code would need to reference /home/ankit/datasets/dogscats/sample/train to find the sub-folder.

Running the code on FloydHub follows the same exact requirements, except your code is running on a computer that has the data located at /my_data. That means that the sub-folder is located at /my_data/sample/train, and your code needs to look for it there. If your code references /my_data/sample/train, you won't hit any issues.

I imagine your case looking something like this:

floyd run --data swaroopgrs/datasets/dogscats/1:my_data --env keras "python cnn.py"

And then, in your code, you'd reference the sub-folder with:

training_set = train_datagen.flow_from_directory( '/my_data/sample/train')

Does that make sense? We are in the process of revamping the docs, so if you have more questions or are unclear on anything, let us know--it will help us make the docs better.

ankitPagalGuy commented 7 years ago

Thanks @saiprashanths and @mckayward , very detailed explanation , works for me . Surely floydhub is a great platform for Machine Learning Engineers , great work 👍 , will surely let you if docs can be made better

rraj001 commented 6 years ago

I am facing issue while uploading and using the dataset. I uploaded my data in one of floydhub datasets. But I am getting this eror while running: 2018-02-17 20:21:09,826 INFO - imlist = os.listdir( 'mydata') 2018-02-17 20:21:09,826 INFO - FileNotFoundError: [Errno 2] No such file or directory: 'mydata' My code part involving the data collection part is as:

path2 = 'mydata'
imlist = os.listdir( 'mydata') im1 = array(Image.open('mydata' + '\'+ imlist[0])) m,n = im1.shape[0:2] imnbr = len(imlist) immatrix = array([array(Image.open('mydata'+ '\' + im2)).flatten() for im2 in imlist],'f')

rraj001 commented 6 years ago

Please guide me to upload and mount the dataset and use it in my code..

saiprashanths commented 6 years ago

@rraj001 Please take a look at our docs: https://docs.floydhub.com/guides/data/mounting_data/. If you have any questions, please post them on the forum to get a quicker response: https://forum.floydhub.com/

Sumitha123 commented 6 years ago

Hi Sai, My dataset is located at sumihere/datasets/fer2013/1/fer2013.csv. When I used this command in my Jupyter notebook: image_data = pd.read_csv('sumihere/datasets/fer2013/1/fer2013.csv' ) I got FileNotFoundError
I also tried mounting using -$ floyd run --data sumihere/datasets/fer2013/1:fer2013 --mode jupyter I got the following error: Creating project run. Total upload size: 579.2KiB Syncing code ... [================================] 594492/594492 - 00:00:01 Error: You have reached the maximum number of jobs you can run on this plan. Stop other running jobs to submit a new job. You can also upgrade your plan to increase the limit. Do I need to upgrade ? I still have 1 hour left. Thank you in advance!

saiprashanths commented 6 years ago

If you run floyd run --data sumihere/datasets/fer2013/1:fer2013 --mode jupyter, your data is mounted and available at /fer2013 (note the slash at the beginning)

Please see the example in our docs for more details

Sumitha123 commented 6 years ago

Thanks @saiprashanths !! It is working now 👍 It took me a while to figure everything out, but it's all good now. Kudos to the team at Floydhub 🥇

ajinkyaambatwar commented 6 years ago

HI @ankitPagalGuy how did you upload that cats and dogs dataset from your local directory in floyd environment?

Sumitha123 commented 6 years ago

@ajinkyaambatwar https://www.youtube.com/watch?v=relTVr694Ko&t=206s From 3:38 - 8:00. Hope this helps.

AnilPavanK commented 6 years ago

I'm having trouble using uploaded datasets in my code with following errors:

Job Logs:

2018-10-07 02:02:28 PSTFile "chatbot.py", line 13, in 2018-10-07 02:02:28 PSTlines = open('/Dataset/movie_lines/movie_lines.txt', encoding = 'utf-8', errors='ignore').read().split('\n') 2018-10-07 02:02:28 PSTFileNotFoundError: [Errno 2] No such file or directory: '/Dataset/movie_lines/movie_lines.txt'

Dataset location:

floyd run --data skyfall/datasets/chatbot/1:movie_lines floyd run --data skyfall/datasets/chatbot/1:movie_conversations

python code:

Importing the dataset

lines = open('/movie_lines/movie_lines.txt', encoding = 'utf-8', errors='ignore').read().split('\n') conversations = open('/movie_conversations/movie_conversations.txt', encoding = 'utf-8', errors='ignore').read().split('\n')

Do I need to make any changes in my code?