Open JerrZzzz opened 10 months ago
According to the original data in the Open Data Toronto database, about 20 rows of data were recorded. In your research, 3 rows of data were extracted and analyzed. Do you think these three lines are representative? Or 3 lines taken randomly?
I found that you used the form of a scatter plot in the first graph, with incident station area as the horizontal axis and time difference as the vertical axis. Is that kind of plot benefited you to analyse problem more effective?
After you finished the first scatter plot, I found that you made another scatter plot with more precise units. This means there are more colored points at the top of the graph. In addition to making the entire graph looks better and more eye-catching, do you think this approach can help make the data easier to observe?
According to the original data in the Open Data Toronto database, about 20 rows of data were recorded. In your research, 3 rows of data were extracted and analyzed. Do you think these three lines are representative? Or 3 lines taken randomly?
I didn't take them randomly. I want to find out the time taken for an unit to arrive on scene. So I chose the area number which will represent each unit and receive time and arrive time these will be enough to give me a graph. I would say it is somewhat representative.
According to the original data in the Open Data Toronto database, about 20 rows of data were recorded. In your research, 3 rows of data were extracted and analyzed. Do you think these three lines are representative? Or 3 lines taken randomly?
I didn't take them randomly. I want to find out the time taken for an unit to arrive on scene. So I chose the area number which will represent each unit and receive time and arrive time these will be enough to give me a graph. I would say it is somewhat representative.
When sorting out the data extracted from the original csv file, I found that you used clean_name(), the code mentioned in Chapter 2. Do you think that once you sort out the column names, it will make the next steps easier for you? Is it faster?
According to the original data in the Open Data Toronto database, about 20 rows of data were recorded. In your research, 3 rows of data were extracted and analyzed. Do you think these three lines are representative? Or 3 lines taken randomly?
I didn't take them randomly. I want to find out the time taken for an unit to arrive on scene. So I chose the area number which will represent each unit and receive time and arrive time these will be enough to give me a graph. I would say it is somewhat representative.
When sorting out the data extracted from the original csv file, I found that you used clean_name(), the code mentioned in Chapter 2. Do you think that once you sort out the column names, it will make the next steps easier for you? Is it faster?
I tried to follow what the professor do in every example made in the book. I think it is somewhat useful. I made my code become much more readable but maybe when I encounter my first project outside this course. I might not use this code.
According to the original data in the Open Data Toronto database, about 20 rows of data were recorded. In your research, 3 rows of data were extracted and analyzed. Do you think these three lines are representative? Or 3 lines taken randomly?
I didn't take them randomly. I want to find out the time taken for an unit to arrive on scene. So I chose the area number which will represent each unit and receive time and arrive time these will be enough to give me a graph. I would say it is somewhat representative.
When sorting out the data extracted from the original csv file, I found that you used clean_name(), the code mentioned in Chapter 2. Do you think that once you sort out the column names, it will make the next steps easier for you? Is it faster?
I tried to follow what the professor do in every example made in the book. I think it is somewhat useful. I made my code become much more readable but maybe when I encounter my first project outside this course. I might not use this code.
I found that you used the form of a scatter plot in the first graph, with incident station area as the horizontal axis and time difference as the vertical axis. Is that kind of plot benefited you to analyse problem more effective?
According to the original data in the Open Data Toronto database, about 20 rows of data were recorded. In your research, 3 rows of data were extracted and analyzed. Do you think these three lines are representative? Or 3 lines taken randomly?
I didn't take them randomly. I want to find out the time taken for an unit to arrive on scene. So I chose the area number which will represent each unit and receive time and arrive time these will be enough to give me a graph. I would say it is somewhat representative.
When sorting out the data extracted from the original csv file, I found that you used clean_name(), the code mentioned in Chapter 2. Do you think that once you sort out the column names, it will make the next steps easier for you? Is it faster?
I tried to follow what the professor do in every example made in the book. I think it is somewhat useful. I made my code become much more readable but maybe when I encounter my first project outside this course. I might not use this code.
I found that you used the form of a scatter plot in the first graph, with incident station area as the horizontal axis and time difference as the vertical axis. Is that kind of plot benefited you to analyze problem more effective?
I think yes, graph, as our professor say, always provide a good visual on any question we are focusing on. But first my scatter plot does not provide me with a lot of information because of the outlier and the number of data points.
What codes did you use to interact with open data Toronto, or how do you think these codes helped you analyze the problem? Will it make you better at handling database data?
When you used R code to extract data, organize data, and plot, what coding errors did you encounter? How did you solve it? Will other types of code be considered to replace the existing code?
In the step of Cleaning data, I found that you used the code as.POSIXct()? Is this a bit too verbose for the complexity? Is it possible to break this line of code into shorter lines? Or use other simpler codes with similar functions? When running code, will the system freeze or be unable to process the code quickly due to this issue?
I noticed that when you analyzed the research problem, you used codes like unique(), head(), select(), etc. that you learned in Chapter 2. When I tested these codes, they could run completely. But if these codes are broken, will the scatter plot not display eye-catching data graphs because of this?
In the clean data section, you used clean_names() to edit and process the data names extracted from the csv file. I think this is very efficient for you and the reviewer to understand the names and meanings of some variables. However, on the x-axis in the first and second figures, why is the Incident station area named as the x-axis a number such as 0/20/40/80? Does it have any other meaning? Or should you change the name of the x-axis to the region code?
At the end of this research, I found it very clear that you summarized the time it took the Toronto Fire Department to arrive on scene during a fire in 2018 based on the histogram you made. Do you think the other two scatter plots give you useful information? Or does it help your overall analysis? Is it possible to add to the final conclusion to provide more information and attach relevant code?
Regarding style, are naming conventions in R easier to utilize throughout research reports? In other words, do you think you have used the R language style perfectly? Or is there anything that needs improvement?
Did you encounter any system bugs when running the entire code? If you encounter it, do you clean up the existing code and record it? For style issues, are authoritative style guides strictly followed?
Regarding Documentation, have you updated the relevant documentation after repairing or correcting the code? Please be careful not to affect existing READMEs.
What codes did you use to interact with open data Toronto, or how do you think these codes helped you analyze the problem? Will it make you better at handling database data?
I browsed the open data toronto and choose a download address so that I can let R download file from that location. I think yes but handling data is not just downloading and interacting with it right.
Two lines of as.POSIXct are used in the clean data section. Can these two lines be replaced or one of them omitted?
In the clean data section, what is the contextual relationship between the code str() and head()?
When you used R code to extract data, organize data, and plot, what coding errors did you encounter? How did you solve it? Will other types of code be considered to replace the existing code?
The error that I encounter which I remember very specifically is when I want to change my column to time formate using as.POSIXct. It really confuse me of its function. Using help function really help me.
For GOOD THINGS, I think your research report has many commendable advantages, such as the selectivity of data extraction and the use of two different graphs for data processing. I think your composition and color selection are very attractive. What made you think of that?
In the step of Cleaning data, I found that you used the code as.POSIXct()? Is this a bit too verbose for the complexity? Is it possible to break this line of code into shorter lines? Or use other simpler codes with similar functions? When running code, will the system freeze or be unable to process the code quickly due to this issue?
as.POSIXct() is a function apply to column so that it could make the column into a time format. You can use help() function to find out specifics if you want. When I do the code, It didn't really have to problem to freeze.
I noticed that when you analyzed the research problem, you used codes like unique(), head(), select(), etc. that you learned in Chapter 2. When I tested these codes, they could run completely. But if these codes are broken, will the scatter plot not display eye-catching data graphs because of this?
I think that these Chapter 2 codes are aimed to clean the data and only show the data that we wanted. So, personally, I won't expect to have a scatter plot to behave like that when there is a much smaller amount of data. I will try to use the correct graph next time.
In the clean data section, you used clean_names() to edit and process the data names extracted from the csv file. I think this is very efficient for you and the reviewer to understand the names and meanings of some variables. However, on the x-axis in the first and second figures, why is the Incident station area named as the x-axis a number such as 0/20/40/80? Does it have any other meaning? Or should you change the name of the x-axis to the region code?
The first graph that I draw is really out of my hand because the way it behaved is not what i really expect. The x-axis shown the unit number which does not provide me with any linear increase meanings. it just represent the area code which does not shown a linear pattern in any way.
At the end of this research, I found it very clear that you summarized the time it took the Toronto Fire Department to arrive on scene during a fire in 2018 based on the histogram you made. Do you think the other two scatter plots give you useful information? Or does it help your overall analysis? Is it possible to add to the final conclusion to provide more information and attach relevant code?
The other two scatter plot really does not give me much information rather than some outliers. I should probably dig deeper into the mean and std. of the whole data. It is a good idea to put some relevant code in my conclusion. But I really want to keep my conclusion short.
Regarding style, are naming conventions in R easier to utilize throughout research reports? In other words, do you think you have used the R language style perfectly? Or is there anything that needs improvement?
I cannot say perfectly. I still got a lot to learn but I can operate using R language. There is still a lot need to improve. I want to do a reference and try to analyze the data in a more detailed way.
Did you encounter any system bugs when running the entire code? If you encounter it, do you clean up the existing code and record it? For style issues, are authoritative style guides strictly followed?
Not really for my codes. but i ran into some render problem at the end using my Rstudio. i cannot solve it. So I go to the posi cloud. It helps.
Regarding Documentation, have you updated the relevant documentation after repairing or correcting the code? Please be careful not to affect existing READMEs.
Yes. I am also missing reference file. I will make sure i won't miss it next time.
Two lines of as.POSIXct are used in the clean data section. Can these two lines be replaced or one of them omitted?
Good point. I think maybe it can. but I might have to do a for loop which will make the whole process a bit harder. So I still recommend the old and more "stupid" way
In the clean data section, what is the contextual relationship between the code str() and head()?
Str() is try to keep the column in a character.
For GOOD THINGS, I think your research report has many commendable advantages, such as the selectivity of data extraction and the use of two different graphs for data processing. I think your composition and color selection are very attractive. What made you think of that?
Thank you. I will try to improve and make much better analyze next time. It is all because of what my professor say that graph is always a much better visual.
Is it really true that people always say that cities like Toronto the time between a fire team receive the notification for a fire to the team arrive at the scene will be under 6 minute. Some website said that it is the target to achieve 6 minutes and 24 seconds 90% of the time. Now we can find out.