Closed ankitPagalGuy closed 7 years ago
As the docs mention, the syntax is --data <data_name>:<mount_point>
.
If you provide --data ankit/datasets/cnn_training_set/1:my_data
, your data will be mounted at /my_data
.
Thank you for help , is there any way I can fetch data from sub folder , like in this datasethttps://www.floydhub.com/swaroopgrs/datasets/dogscats/1
, there is a folder named sample , if I have to access my_data/sample/train
is it possible ? or there is another way . Thank you
If you mount your data using --data ankit/datasets/cnn_training_set/1:my_data
, yes, just can just access it at /my_data/sample/train
(note the /
in the beginning).
For more advanced cases (e.g. you want your data to be available at /ankit/a/b/c/d
), please see this guide on symlinking: http://docs.floydhub.com/guides/data/symlink_mounted_data/
@saiprashanths Halfway through writing this when you posted :blush: Posting anyways in case it helps anybody down the road.
@ankitPagalGuy When you run your job using floyd run
and pass the --data
flag like this:
floyd run --data ankit/datasets/cnn_training_set/1:my_data "python cnn.py"
your code will be sent up to FloydHub and run on a computer that has your data located on its filesystem at /my_data
.
Compare this to if the data were on your computer, and you were running the code locally. You might have the data in a folder located somewhere like /home/ankit/datasets/dogscats
. When running the code on your computer, your code would need to reference /home/ankit/datasets/dogscats/sample/train
to find the sub-folder.
Running the code on FloydHub follows the same exact requirements, except your code is running on a computer that has the data located at /my_data
. That means that the sub-folder is located at /my_data/sample/train
, and your code needs to look for it there. If your code references /my_data/sample/train
, you won't hit any issues.
I imagine your case looking something like this:
floyd run --data swaroopgrs/datasets/dogscats/1:my_data --env keras "python cnn.py"
And then, in your code, you'd reference the sub-folder with:
training_set = train_datagen.flow_from_directory( '/my_data/sample/train')
Does that make sense? We are in the process of revamping the docs, so if you have more questions or are unclear on anything, let us know--it will help us make the docs better.
Thanks @saiprashanths and @mckayward , very detailed explanation , works for me . Surely floydhub is a great platform for Machine Learning Engineers , great work 👍 , will surely let you if docs can be made better
I am facing issue while uploading and using the dataset. I uploaded my data in one of floydhub datasets. But I am getting this eror while running: 2018-02-17 20:21:09,826 INFO - imlist = os.listdir( 'mydata') 2018-02-17 20:21:09,826 INFO - FileNotFoundError: [Errno 2] No such file or directory: 'mydata' My code part involving the data collection part is as:
path2 = 'mydata'
imlist = os.listdir( 'mydata')
im1 = array(Image.open('mydata' + '\'+ imlist[0]))
m,n = im1.shape[0:2]
imnbr = len(imlist)
immatrix = array([array(Image.open('mydata'+ '\' + im2)).flatten()
for im2 in imlist],'f')
Please guide me to upload and mount the dataset and use it in my code..
@rraj001 Please take a look at our docs: https://docs.floydhub.com/guides/data/mounting_data/. If you have any questions, please post them on the forum to get a quicker response: https://forum.floydhub.com/
Hi Sai,
My dataset is located at sumihere/datasets/fer2013/1/fer2013.csv.
When I used this command in my Jupyter notebook:
image_data = pd.read_csv('sumihere/datasets/fer2013/1/fer2013.csv' )
I got FileNotFoundError
I also tried mounting using -$ floyd run --data sumihere/datasets/fer2013/1:fer2013 --mode jupyter
I got the following error:
Creating project run. Total upload size: 579.2KiB
Syncing code ...
[================================] 594492/594492 - 00:00:01
Error: You have reached the maximum number of jobs you can run on this plan. Stop other running jobs to submit a new job. You can also upgrade your plan to increase the limit.
Do I need to upgrade ? I still have 1 hour left.
Thank you in advance!
If you run floyd run --data sumihere/datasets/fer2013/1:fer2013 --mode jupyter
, your data is mounted and available at /fer2013
(note the slash at the beginning)
Please see the example in our docs for more details
Thanks @saiprashanths !! It is working now 👍 It took me a while to figure everything out, but it's all good now. Kudos to the team at Floydhub 🥇
HI @ankitPagalGuy how did you upload that cats and dogs dataset from your local directory in floyd environment?
@ajinkyaambatwar https://www.youtube.com/watch?v=relTVr694Ko&t=206s From 3:38 - 8:00. Hope this helps.
I'm having trouble using uploaded datasets in my code with following errors:
2018-10-07 02:02:28 PSTFile "chatbot.py", line 13, in
floyd run --data skyfall/datasets/chatbot/1:movie_lines floyd run --data skyfall/datasets/chatbot/1:movie_conversations
lines = open('/movie_lines/movie_lines.txt', encoding = 'utf-8', errors='ignore').read().split('\n') conversations = open('/movie_conversations/movie_conversations.txt', encoding = 'utf-8', errors='ignore').read().split('\n')
Do I need to make any changes in my code?
I read docs here http://docs.floydhub.com/guides/data/mounting_data/, I have already uploaded dataset on floydhub . I mounted this dataset successfully using
floyd run --data ankit/datasets/cnn_training_set/1 --env keras "python cnn.py"
what I am not able to understand is what should I give path for dataset in my code , I tried various things such as
training_set = train_datagen.flow_from_directory( 'ankit/datasets/cnn_training_set/1'...
training_set = train_datagen.flow_from_directory( 'datasets/cnn_training_set/1'...
training_set = train_datagen.flow_from_directory( 'cnn_training_set/1'...
but every time it gave error FileNotFoundError: [Errno 2] No such file or directory: , what am I doing error ? anything I am missing ?