bokeh / outreach-programs

A space to coordinate outreach programs like Outreachy and Google Season of Docs.
9 stars 12 forks source link

[Micro task] Explore NYC taxi trips dataset #6

Open pavithraes opened 1 year ago

pavithraes commented 1 year ago

The New York City TLC taxi trips records data is frequently used for creating examples and tutorials for Python data science workflows. You can access the dataset through any of the following ways:

Note that the actual dataset is quite large, so please use a subset of the data or consider reducing it.

To complete this micro-task, download and explore a subset of the dataset with Bokeh plots. You can share your Jupyter Notebooks with us as a GitHub gist. As per Bryan's comment here, please open separate issues/PRs with your wok, so that we can share feedback individually.

robinokwanma commented 1 year ago

Hi, i was approved for the initial stage of Outreachy. i noticed the datasets are in parquet format. I need some clarity and guidance, can bokeh read the parquet files directly? I was able to read them using pandas.pd. @pavithraes

BhaswatiRoy commented 1 year ago

Hello @pavithraes In this issue, we will mainly focus on cleaning and preprocessing the data as well as visualizing the data using Bokeh with as many important plots as possible??

akanshajais commented 1 year ago

Hello @pavithraes , I am exploring this data for the project Create a Blog Post Series - " Fundamentals of Data visualization in Bokeh ". I will use some python libraries to summarize and analyze the data after Performing tasks like Data Wrangling and Data processing to visualize it as per project requirements and then I'll use it on project.

AnishereMariam commented 1 year ago

Hi @robinokwanma, open the link to the dataset website, scroll down, and you will see a hyperlink; "Working with PARAQUET format" right under the "Data Dictionary and MetaData" subtitle. There are details on how to work with the format in there and full details in the "trip record user guide".

robinokwanma commented 1 year ago

Thank you @anisheremariam . I'm taking a look now

robinokwanma commented 1 year ago

Hi @pavithraes @anisheremariam Does this work? https://gist.github.com/robinokwanma/cc81d1a9f491377f963216848c036d26

That's the link to my githubgist on this microtask. Please review

Soot3 commented 1 year ago

@pavithraes started with this https://gist.github.com/Soot3/9eaf170fa2048e373e05046222350f54

oluwaseun-tech commented 1 year ago

@anisheremariam thank you for answering the question @robinokwanma, I was about to ask the same question.

oluwaseun-tech commented 1 year ago

@pavithraes I realized that the dataset was done on monthly basis, can someone download more than one month's dataset for the exploration?

AnishereMariam commented 1 year ago

@robinokwanma, I opened the file and noticed that although the code seems fine, some variables are wrongly placed. Please fix that.

@oluwaseun-tech, you are most welcome. Each month has over a million rows of data, if you are sure you can handle multiple months, it's great, but I suggest you use a subset of the data. That is just my opinion.

robinokwanma commented 1 year ago

Thank's i have made the changes.

oluwaseun-tech commented 1 year ago

Oh! Okay thank you

oluwaseun-tech commented 1 year ago

@pavithraes please take a look at what have done so far https://gist.github.com/oluwaseun-tech/ef413dd9658b2123bfc7240652bae90b

BhaswatiRoy commented 1 year ago

Hello @pavithraes @anisheremariam Here is my work on the analysis of NYC Taxi data on Jupyter Notebook. I have also attached pictures of the output after the codes. Reviews would be appreciated.

https://github.com/BhaswatiRoy/Data-Analysis-Projects/tree/main/Bokeh_Plots

JoyclynUjunwaOgbonna commented 1 year ago

robinokwanma

I had this similar problem. You can use pd.read_parquet to load the dataset.

AnishereMariam commented 1 year ago

Hi @BhaswatiRoy, your choice of visualizations is really cool.

AnishereMariam commented 1 year ago

Hi @JoyclynUjunwaOgbonna, have you been able to solve that via the solutions I suggested earlier?

BhaswatiRoy commented 1 year ago

Hi @BhaswatiRoy, your choice of visualizations is really cool.

thanks @anisheremariam for the feedback, I am on my way to adding more visualizations!

AnishereMariam commented 1 year ago

That is perfect @BhaswatiRoy

AnishereMariam commented 1 year ago

@pavithraes, the link to my work on NYC Data Exploration on GitHub gist is below: https://gist.github.com/anisheremariam/e5f4cb9f46f05f7ba5aa35d449922f53 I appreciate any reviews and comment on it. Thank you

Faith-Nchifor commented 1 year ago

Hello @pavithraes, @bryevdv, everyone. I have an issue. My lineplot does not display as expected. If you look at it, you will see that it does not plot as expected. What can I do? Here is the link to my notebook: https://www.kaggle.com/faithnchifor/nyc-trips-viz

JoyclynUjunwaOgbonna commented 1 year ago

@Faith-Nchifor the link to your notebook is showing a 404 error-"I can't find this page". This usually happens when your kaggle notebook is on private. Could you check if your notebook is on private? If it is, you might want to make it public so people can access it.

Faith-Nchifor commented 1 year ago

I'm sorry about that @JoyclynUjunwaOgbonna . It's now public

AnishereMariam commented 1 year ago

@Faith-Nchifor, I think it is the method you used. The chart followed the irregular fitting of the index. Would you consider using the groupby method; group by use groupby

Faith-Nchifor commented 1 year ago

@anisheremariam your method is good. I realized that my map behaved the way it did because I never sorted the data. It looks just like this one now. Thanks for your input

Faith-Nchifor commented 1 year ago

Hello @bryevdv, @pavithraes Here is the link to my gist: https://gist.github.com/Faith-Nchifor/b57ee2140e2dd1ea110d5f17c54626ee My project interest is Create a blog post series: "Fundamentals of Data Visualization in Bokeh"

Ajoke23 commented 1 year ago

Hi @Faith-Nchifor well done

Ajoke23 commented 1 year ago

Hi @BhaswatiRoy nice analysis and you choice of visualization is really great

Ajoke23 commented 1 year ago

If you are having any challenges regarding the project, ask on this channel. I will be of great help to assist anyone

BhaswatiRoy commented 1 year ago

thanks @Ajoke23 for the reviews

Isaakkamau commented 1 year ago

Hello, @pavithraes @anisheremariam please take a look at my first assignment on the analysis of NYC Taxi data on Jupyter Notebook. https://gist.github.com/Isaakkamau/358d2ccff3612d95496972fa67842021

anushka-png commented 1 year ago

Hello everyone ,my name is Anushka Sharma and I have made my contribution in bokeh#1 project @pavithraes @bryevdv please have a look at my assignment Here is my gist link https://gist.github.com/anushka-png/ffd9d83d2b6b46d169c5e510dc4123d9

I have tried to work with two different datasets first one is TLC Driver 24 hour course and second one is yellow taxi dataset for the month oct and nov . Also for the reference , have attached a pdf containing my outputs and other relevant data as well .I am contributing to a project for the first time . I appreciate any reviews and comment on it. Thank you

Faith-Nchifor commented 1 year ago

https://github.com/bokeh/outreach-programs/issues/6#issuecomment-1465110443 Thank you @Ajoke23 Do you have an idea on how I can make my plots to show in my notebook on GitHub gist ?

Azaya89 commented 1 year ago

Hi, here is my submission for the microtask on the project, Create a blog post series: "Fundamentals of Data Visualization in Bokeh." https://github.com/Azaya89/Bokeh-microtask

Attached in a separate images folder are the plots that were generated inline. For some reason, they do not appear inline in the notebook here on github.

Ajoke23 commented 1 year ago

https://github.com/bokeh/outreach-programs/issues/6#issuecomment-1465110443 Thank you @Ajoke23 Do you have an idea on how I can make my plots to show in my notebook on GitHub gist ?

To show plot: You do show(variable name) Variable name assign when creating the plot

AnishereMariam commented 1 year ago

Hello, @pavithraes @anisheremariam please take a look at my first assignment on the analysis of NYC Taxi data on Jupyter Notebook. https://gist.github.com/Isaakkamau/358d2ccff3612d95496972fa67842021

@Isaakkamau, that is fine work. keep up the good work.

Isaakkamau commented 1 year ago

@anisheremariam thanks a lot, but how many visualizations are we supposed to have? I decided first to do one then I can add others if it's needed

PatChizzy commented 1 year ago

Hello @pavithraes @anisheremariam

Please find my contribution for task 1 here Your feedback would be appreciated.

I added the visualizations as comment since github gist cant render it from my notebook.

Ajoke23 commented 1 year ago

Hi, here is my submission for the microtask on the project, Create a blog post series: "Fundamentals of Data Visualization in Bokeh." https://github.com/Azaya89/Bokeh-microtask

Attached in a separate images folder are the plots that were generated inline. For some reason, they do not appear inline in the notebook here on github.

Well done @Azaya89, you did a great work

Ajoke23 commented 1 year ago

Hello @pavithraes @anisheremariam

Please find my contribution for task 1 here Your feedback would be appreciated.

I added the visualizations as comment since github gist cant render it from my notebook.

You did a great work. Well done @PatChizzy. Unique and creative visualization

bryevdv commented 1 year ago

Hi all thanks for the submissions so far! This is our first time doing outreachy so this is a learning experience for us as well! One thing that has become apparent is that it is a bit confusing and difficult to provide individualized comments when all the submissions are mixed together in one place like this! I'd like to ask everyone who has submitted here to open a new issue that has any relevant links, images, etc for your work. This will allow us to have 1-1 conversations with everyone on their own issue :)

Ajoke23 commented 1 year ago

Hi all thanks for the submissions so far! This is our first time doing outreachy so this is a learning experience for us as well! One thing that has become apparent is that it is a bit confusing and difficult to provide individualized comments when all the submissions are mixed together in one place like this! I'd like to ask everyone who has submitted here to open a new issue that has any relevant links, images, etc for your work. This will allow us to have 1-1 conversations with everyone on their own issue :)

For those that might been having issue figuring it out you can follow this steps. To do this kindly: 1.visit the link to the project on Github https://github.com/bokeh/outreach-programs/issues/6

  1. If you are using a desktop, click on the "New issue" button on the right hand side of the page.
  2. Write a title and a description. Give a descriptive title and a well detailed description on the "Write" section comment. Let the description contain the link to notebook
  3. Click on "Submit new issues"

That's all. I hope this helps someone

Azaya89 commented 1 year ago

#6 (comment) Thank you @Ajoke23 Do you have an idea on how I can make my plots to show in my notebook on GitHub gist ?

To show plot: You do show(variable name) Variable name assign when creating the plot

I think the issue here is not the code written. What I've been able to figure out is that using output_notebook() in the jupyter notebook is what renders it inline on your notebook but exporting the notebook to github won't render the plots since output_notebook() is not running on github. So it's best to also post the plots as a separate image.

I hope this helps.

Faith-Nchifor commented 1 year ago

#6 (comment) Thank you @Ajoke23 Do you have an idea on how I can make my plots to show in my notebook on GitHub gist ?

To show plot: You do show(variable name) Variable name assign when creating the plot

This works quite alright in my python environment. However, when the notebook has been downloaded, the images do no show

Faith-Nchifor commented 1 year ago

#6 (comment) Thank you @Ajoke23 Do you have an idea on how I can make my plots to show in my notebook on GitHub gist ?

To show plot: You do show(variable name) Variable name assign when creating the plot

I think the issue here is not the code written. What I've been able to figure out is that using output_notebook() in the jupyter notebook is what renders it inline on your notebook but exporting the notebook to github won't render the plots since output_notebook() is not running on github. So it's best to also post the plots as a separate image.

I hope this helps.

Okay @Azaya89. I'm gonna try it out. Thanks

Azaya89 commented 1 year ago

Okay @Azaya89. I'm gonna try it out. Thanks

You're welcome.