SciRuby / daru

Data Analysis in RUby
BSD 2-Clause "Simplified" License
1.04k stars 139 forks source link

Added functionality as per Issue #289 #300

Closed anshuman23 closed 7 years ago

anshuman23 commented 7 years ago

Image Screenshot of two dataframes being loaded from csv files hosted on these links ->

  1. https://raw.githubusercontent.com/anshuman23/ruBayes/master/test/sample.csv
  2. https://raw.githubusercontent.com/anshuman23/ruBayes/master/train/train2.csv

The second csv file is the Iris Dataset.

anshuman23 commented 7 years ago

Hi! I'm kind of new to this, so I'm sorry if I did not do the obvious. I just committed a small test to check url loading capability and had also checked styling using rubocop. I also made the code a little simpler and wrote it around from_csv. I had already committed these changes before I saw your suggestions. I will make the required changes soon and update my PR accordingly to address the issues raised. I will remove the dependency on using a temp csv file.
Thank you!

anshuman23 commented 7 years ago

Hi @zverok , I have made the appropriate changes. If possible I'd like to ask you a few simple questions -

  1. I have created a vcr_config.rb file which I use in the io_spec.rb test -
    
    require 'webmock'                                                                                                                                                        
    require 'vcr'                                                                                                                                                            

VCR.configure do |config|
config.cassette_library_dir = '../fixtures/vcr_cassettes'
config.hook_into :webmock
end


However, I think the main problem here would still be that VCR would be able to give me the HTTP GET object offline from the URL, but that is being done internally by 'from_csv_url' using open-uri. So what do you advise regarding this?

2. To add the vcr and webmock dependencies should I add them to the Gemfile directly using gem 'webmock' and gem 'vcr'?

If you are okay with these changes, I'll commit them to finalize the PR.

Thank you!
zverok commented 7 years ago

@anshuman23 Thanks for your work! But I still feel like code could be made a bit better. Note the two things:

WDYT about just making from_csv just a bit "smarter" -- with open-uri it would be litterally just using CSV.new(open(path_or_url)) here: https://github.com/SciRuby/daru/blob/master/lib/daru/io/io.rb#L206 and there: https://github.com/SciRuby/daru/blob/master/lib/daru/io/io.rb#L214, and make sure nothing broken in tests + providing tests for new behavior (reading from URL).

However, I think the main problem here would still be that VCR would be able to give me the HTTP GET object offline from the URL, but that is being done internally by 'from_csv_url' using open-uri. So what do you advise regarding this?

Please look at VCR usage examples at its docs. VCR+webmock (or just webmock, for such a simple case) for making any "real" network calls mocked with test data.

To add the vcr and webmock dependencies should I add them to the Gemfile directly using gem 'webmock' and gem 'vcr'?

Yes, just in proper groups (development/test, not main dependencies).

anshuman23 commented 7 years ago

Thank you so much for the advice and the feedback. I realized the messiness of the code even when I was coding but I was so impatient to get done that I overlooked the value of the work. I'll take into consideration all that you've said and commit better changes soon. Thanks again!

v0dro commented 7 years ago

I am against introducing any new methods for reading a CSV. You should be able to figure out whether a string contains a URL or a file path by parsing it. All these features should be inside from_csv itself.

anshuman23 commented 7 years ago

@zverok , @v0dro - I have made changes as guided. I hope the latest commits are not lacking in any way. Please let me know your reviews. Thank you :)

anshuman23 commented 7 years ago

@v0dro @zverok - Thanks again for the feedback. I've made these changes as well and pushed the commits. Please let me know if there is still something to be done.

anshuman23 commented 7 years ago

Hi @zverok @v0dro - Is there something wrong with the commits I had pushed? If there are existing problems I'll rectify them although I did take into account all the suggestions and wrote the test for the functionality too

anshuman23 commented 7 years ago

@zverok - I have made the changes. Please review. :)

I will write the tests for #308 in a separate PR

v0dro commented 7 years ago

Quick aside: The title of your PR should clearly state what issue it address explicitly. Listing just the issue number is not very useful for a reader.

v0dro commented 7 years ago

@anshuman23 whats the status of this PR?

v0dro commented 7 years ago

Has been resolved by #334