Rakesh9100 / ML-Project-Drug-Review-Dataset

This is an innovative machine learning project that utilizes patient reviews with many other attributes to analyze and evaluate the effectiveness of drugs.
https://ml-project-drug-review-dataset.streamlit.app
Apache License 2.0
82 stars 119 forks source link

Implement Web Scraping Using Multiprocessing #22

Closed Ashgen12 closed 1 year ago

Ashgen12 commented 1 year ago

Prerequisites

Description

Hi I'm Ashis Baidya | Contributor GSSOC'23. Many libraries and models are available for web scraping of data related to drugs,[in our case patients reviews and drugs effect on them]. There I'm willing to implement python libraries which can increase the efficiency of the web scraping and reduce the amount of time to scrap a data using:- requests,Bs4, html5lib,dask,scrapy and multiprocessing .

Screenshots

No response

Code of Conduct

SohamD242 commented 1 year ago

Hi, I'm Soham Deshpande (Contributor GSSOC'23) Many websites doesn't offer API so to tackle this we can use Web Scraping to access data in easy and structured manner. Python libraries like bs4, BeautifulSoup, Scrapy, Selenium etc. are generally used for web scraping. Here I'm willing to apply these libraries and use an effective way of Multiprocessing to speed up Web Scrapping. Multiprocessing is very helpful, when multiple URLs are scraped to get the data. It will perform scraping on multiple URLs thus saving our time.

Rakesh9100 commented 1 year ago

@Ashgen12 Okay assigning you, proceed

Rakesh9100 commented 1 year ago

@Ashgen12 Update of your work

Ashgen12 commented 1 year ago

@Rakesh9100 yeah i had updated the work, i had pulled the code for updation of web scraping, here is the link :- https://github.com/Rakesh9100/ML-Project-Drug-Review-Dataset/pull/62