bsc-iitm / Data-Visualization-Design-CS4001

6 stars 0 forks source link

Graded Assignment -4 (May Term 2023):- Redesigning The Hindu Data Point Stories #16

Open Jimmi-Kr opened 1 year ago

Jimmi-Kr commented 1 year ago

For this assignment, we'll use data stories from The Hindu Data Point. Use what you have learned in Week 4 & Week 5 for doing this assignment.

Select a story that you like, study it carefully, and redesign it. Specifically, we want you to focus on understanding the data that powers the story, and how it is visually encoded to tell the intended story. Document your design process, capturing the following:

What is the story the author is trying to tell?
What data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.
How is it encoded, what problems are with it, and how have you attempted to improve it?

You may choose to expand or curtail the scope of the data used in the story or add an additional dataset to tell the story better. But do not deviate from the main intent of the original story. In other words, it is a redesign exercise, and hence I do not want you to tell a different, unrelated story.

While you should provide a link to the original story, it might be useful to capture and display inline, appropriate parts of the original visualization, and your own design iterations to produce coherent documentation.

For reference, take a look at what the previous batches (2019,2020,2021, 2022 )did with this assignment.

anant7k commented 1 year ago

Data | A third of Central University teaching positions lying vacant

Anant Kumar 21f1000683

Article Link

Date Published:

July 23, 2023 03:39 pm | Updated 04:29 pm IST

Authors:

Maitri Porecha, Vignesh Radhakrishnan

Intent of the Original Story:

The original story aimed to visually demonstrate key highlights from an RTI query filed with the Minister of Higher Education by activist Chandrashekhar Gaur. The query provided data on the percentage of vacant positions in central universities across India. The authors highlighted the lack of filled positions in these universities and discussed the major factors behind the higher vacancy rates, such as the location and age of the university.

Dataset Description:

The authors used a recent version of the "State-wise Number of Vacant Posts of Teaching and Non-teaching Staff in Central Higher Educational Institutions." The dataset contained information about the state name, university name, number of sanctioned posts, and vacant posts in both teaching and non-teaching positions across all central universities. The data included both categorical and quantitative variables.

Gaps in the Dataset:

The authors focused on data from the top 10 states/central universities and neglected the rest. Utilizing data from all states could provide a more comprehensive level of detail to the visualization.

My Dataset:

Since the dataset link was not provided in the article, I found a similar dataset on data.gov.in, goverenment dataset website. This data was published on 26 July 2023 and differs from the one used by the authors of the article.

State-wise Number of Vacant Posts of Teaching and Non-teaching Staff in Central Higher Educational Institutions (in reply to Unstarred Question on 15 March, 2023)

Authors Chart 1:

Screenshot 2023-07-28 175741

Improvements in the visualization

The authors used a bar graph to display the total number of sanctioned, filled, and vacant posts in central universities. While the bar graph presented exact numbers, it failed to provide a proportionate visual comparison, which could better justify the article's title.

I've used a sunburst chart to demonstrate the total sanctioned, filled, and vacant positions, offering a clear visualization of the proportion of vacant posts to filled posts.

My Visualization

Total Sactioned Posts

Authors Chart 2:

Screenshot 2023-07-28 175807

Improvements in the visualization

To show the state-wise distribution of vacant positions in central universities for the top 10 states, the authors used a treemap. However, the visualization could be enhanced by adding color encoding to differentiate the ratio of split and including data from all states for a more comprehensive view.

Hence, I've used a choropleth chart to achieve this, providing visual aids to differentiate regions and showcase the share of vacant positions more effectively.

Also, to provide summary at a glance of states with most vacant seats, I've also created a bubble chart providing share of vacant positions between different states.

My Visualizations

Statewise Vacant Segregation

statewise circle

Authors Chart 3:

Screenshot 2023-07-28 175825

Improvements in the visualization

The authors used a treemap to display the top 10 universities with the highest levels of vacant positions. However, the proportion was not clearly interpretable due to a lack of color encoding and improper sections in the treemap.

I've Used a bubble/circle chart with color encoding, which would better differentiate vacant positions across the top 15 universities. Additionally, I created a radial chart to provide information about the top 2 universities with the most vacant positions grouped by state, thus, utilizing the full information provided in the dataset.

My Visualizations

Top 15 University Vacant Positions

Top 2 vacant by State

Authors Chart 4:

Caste-wise data

Overall, only 20% of teacher positions sanctioned under the general category were vacant, compared to 44% among OBC positions, 38% among SC positions and 45% among ST positions. Also notably, 71% of posts sanctioned under the EWS quota and 58% under the Persons with Disabilities were vacant. Among all the reservation groups, General Category positions had the least vacancy share (Chart 4).

Screenshot 2023-07-28 175853

Improvements in the visualization

The authors used a bar chart to visualize the caste-wise split of the share of vacant posts in central universities. While a bar chart is suitable for this information, I noticed that vertical labels with no color encoding made the visualization difficult to interpret. Hence I've used a horizontal bar chart with color encoding, clear labels, and a category legend to make it easier for viewers to understand the information.

% BY category

Overall, I aimed to improve the visualizations by enhancing their clarity, adding color encoding for better differentiation, and utilizing data from all states for a more comprehensive understanding of the vacancy situation in central universities across India.

In order to get a window into how Central Universities recruit for teaching positions, the UGC has launched a portal — CU-Chayan, which will ensure that all vacancies are advertised by the Universities on a single platform. “Since May, up to 20,000 teaching candidates have registered themselves on the portal,” Prof. Kumar said.

He added that the UGC is also studying recruitment patterns by running back-end analytics to understand the timelines over which universities recruit for teaching positions.

Khushiin commented 1 year ago

Name: Khushee A Namdeo Roll Number: 21f3001500

Link to the Article: [(https://www.thehindu.com/data/data-chandrayaan-3-mission-how-tough-is-it-to-land-on-the-moon/article67123812.ece)]

Article published on July 27, 2023 04:06 pm | Updated July 28, 2023 05:37 am IST By: Vignesh Radhakrishnan, Krithika Ganapathy

STORY THAT THE AUTHOR WANTS TO TELL (INTENT OF THE ORIGINAL STORY) :

The article discusses the challenges of lunar missions, with a focus on Chandrayaan-3, India's upcoming moon mission. It highlights historical data indicating that over 40% of moon missions have failed, and the failure rate increases to over 60% for missions involving robotic landers (spacecraft performing controlled landings on the moon's surface). Sample return missions, which collect moon samples and bring them back to Earth, have an even higher failure rate at 67% due to their complexity. The article mentions ISRO's previous lunar mission, Chandrayaan-2, in which the lander named 'Vikram' lost contact with Earth and failed to land on the moon. However, the orbiter mission was successful as it reached the intended lunar orbit. Chandrayaan-3, which launched on July 14, is expected to land on the moon on August 23-24. The mission aims to improve upon Chandrayaan-2 by implementing stronger legs, enhanced power, and an upgraded landing sequence to achieve a successful soft landing on the moon. The article also provides tables and charts showing the failure rates of lunar missions based on different categories. Overall, the article emphasizes the difficulty and risk associated with moon missions, particularly with regards to successful lander missions, and highlights the significance of ISRO's efforts in overcoming these challenges with Chandrayaan-3.

The data that they are using is the number of failures, partial failures, and success percentages of completed lunar missions undertaken by space agencies and other operations. The data that the author has used to convey the higher failure rates than successes is the number of lunar missions out of which, the number of successful missions, and hence the percentage of failures over the years. Apparently, the number of failures was lower in 1970 and 2010. The 2000s stood out as all the missions went successful. Author has used a tabular form of data representation for depicting the number of failures and successes of the lunar missions over the years. The authors have also taken into account the partial failures where though the primary aims weren’t achieved but still there were certain milestones hit.

DIMENSIONS OF DATA IN THE ARTICLE:

The data depicted in the article consists of various dimensions related to the success and failure rates of lunar missions undertaken by different space agencies and operators since 1958. The dimensions are as follows: Mission Types: The data classifies lunar missions into different types, including sample return missions, robotic lander missions, crewed lander missions (part of NASA's Apollo program), orbiter missions, flyby missions, and impactor missions. Success/Failure Rates: The data provides the success and failure percentages for each mission type. It also includes the percentage of partial failures, where some objectives were achieved, but the mission was not completed entirely. Absolute Numbers: For a more comprehensive view, the data presents the absolute numbers of completed lunar missions for each type, allowing readers to understand the frequency and distribution of these missions across decades. Decades: The data is further segmented by decades, ranging from the 1950s to the present (with a focus on recent decades). It shows the number of lunar missions and their failure percentages for each decade. Charts and Tables: The article includes tables and charts to present the data visually, making it easier to understand the trends and patterns in lunar mission success and failure rates over time. The dimensions of the data help highlight the challenges and complexities involved in lunar missions, particularly with regards to landing on the moon and conducting sample return missions. It also provides insights into the improvements and advancements made by spaceflight agencies in executing successful lunar missions over the years.

GAPS IN THE ARTICLE:

Based on the information provided in the article, there are a few gaps and limitations in the data presented: Incomplete Charts: The article mentions that some charts may appear incomplete, but it does not specify which specific charts are incomplete or what data is missing from them. This lack of clarity makes it difficult to fully interpret and understand the visual representations of the data. Lack of Detailed Mission Information: The article provides an overview of the success and failure rates of lunar missions based on broad categories such as sample return missions, robotic lander missions, etc. However, it does not delve into specific details about individual missions, reasons for failure, or the success factors for successful missions. More detailed information on specific missions would provide a better understanding of the challenges faced in lunar exploration. Limited Historical Context: The data presented covers lunar missions since 1958. While it provides insights into the trends over the years, it does not include the most recent missions beyond the mentioned date in the article (July 2023). Including data from more recent missions would provide a more up-to-date analysis of lunar mission success and failure rates. Lack of Comparison Across Space Agencies: The data does not provide a detailed comparison of success and failure rates among different space agencies. It would be valuable to understand how different agencies have fared in lunar missions and whether certain agencies have been more successful than others. Limited Information on Mission Objectives: The article mentions the success rate of Apollo 13's lunar module despite not landing on the moon, but it does not elaborate on the mission's primary objectives or the factors that contributed to its classification as a success. Including more context on mission objectives and success criteria would provide a more comprehensive understanding of mission outcomes.

ESSENTIAL VS IRRELEVANT DATA:

Essential Data:

Overall Lunar Mission Failure Rates: The historical data showing the failure rates of lunar missions (40% overall, 60% for robotic landers, and 67% for sample return missions) is essential as it highlights the challenges and risks associated with moon missions. This data provides important context for understanding the difficulty of exploring and landing on the moon.
Comparison of Mission Types: The comparison of failure rates among different mission types (sample return, robotic lander, crewed lander, orbiter, flyby, and impactor missions) is essential as it allows readers to understand which types of missions are more challenging and prone to failure. This information can help in strategic planning and resource allocation for future missions.
Decade-wise Failure Rates: The data presented for each decade, along with the corresponding failure rates, is essential as it shows the historical progress and improvement in lunar mission success. It helps to identify trends and advancements in space exploration over time.
Success Rates for Crewed Lander Missions: The data indicating a 100% success rate for the seven crewed lander missions under NASA's Apollo program is essential as it highlights the achievements and successes of human space exploration missions.
Orbiter and Flyby Mission Failure Rates: The lower failure rates for orbiter and flyby missions (36.7% and 24.6% respectively) are essential as they demonstrate the relative success and viability of these mission types. Irrelevant Data:
Charts Appear Incomplete: The mention of incomplete charts without specifying which charts are incomplete is irrelevant as it does not provide any actionable information or insights.
Specific Mission Names: The article mentions specific mission names like Chandrayaan-2 and Chandrayaan-3, but it does not provide detailed information about these missions. For the purpose of understanding lunar mission success rates in general, the specific names of individual missions are less relevant without accompanying details.
Lack of Recent Data: The article does not include data beyond July 2023, which may be irrelevant for readers seeking the latest information on lunar missions.
In summary, the essential data in the article includes historical lunar mission failure rates, comparison among mission types, decade-wise failure rates, success rates for crewed lander missions, and the failure rates for orbiter and flyby missions. On the other hand, the irrelevant data includes the mention of incomplete charts, specific mission names without detailed information, and the lack of recent data beyond July 2023.

ENCODING OF THE DATA:

The data in the article is primarily presented in a tabular format, which is a common way to represent structured information. Each table contains rows and columns with relevant data points, such as success/failure rates, mission types, and decades. Additionally, there are charts used to visually represent some of the data.

Author’s Chart 1:

IMPROVEMENTS IN THE CHART 1:

The Authors have used a table to represent the percentages of failures, partial failures and success of lunar missions. Here are a few problems with the tabular kind of representation that the Authors have used: Insufficient Differentiation: The table includes three categories: failure, partial failure, and success, but it does not explain the difference between "failure" and "partial failure." Without clear definitions, it might be confusing for readers to interpret the data accurately. Limited Comparisons: The table does not allow for easy comparison between different mission types or decades, as the data is presented in separate rows. Comparing the success rates across different categories might be more intuitive if the data were presented side by side. Lack of Visual Aids: The data is presented in a tabular format, which can be less visually engaging compared to charts or graphs. Visual aids like bar charts or pie charts can help readers quickly grasp the distribution of success and failure rates.

So here is how I have improvised the representation: Use Visualizations: Convert the data into visually engaging charts or graphs to facilitate better understanding Use Color Coding: Incorporate color coding to highlight specific trends or patterns in the data, making it visually appealing and easier to interpret.

Another layer of improvisation that I have added is to calculate the Estimated probability of a lunar mission to succeed which came out to be as follows:

Hence my visualization was a line graph that showed the trends and patterns in the various elements including estimated probability or percentage of success.

Author’s Chart 2:

IMPROVISATIONS IN CHART 2:

Again the Authors have used a table to represent the percentages of failures, partial failures and success of lunar missions. Here are a few problems with the tabular kind of representation that the Authors have used: Insufficient Differentiation: The table includes three categories: failure, partial failure, and success, but it does not explain the difference between "failure" and "partial failure." Without clear definitions, it might be confusing for readers to interpret the data accurately. Limited Comparisons: The table does not allow for easy comparison between different mission types or decades, as the data is presented in separate rows. Comparing the success rates across different categories might be more intuitive if the data were presented side by side. Lack of Visual Aids: The data is presented in a tabular format, which can be less visually engaging compared to charts or graphs. Visual aids like bar charts or pie charts can help readers quickly grasp the distribution of success and failure rates.

So here is how I have improvised the representation: Use Visualizations: Convert the data into visually engaging charts or graphs to facilitate better understanding Use Color Coding: Incorporate color coding to highlight specific trends or patterns in the data, making it visually appealing and easier to interpret.

I have added another layer of improvisation by stacked bar chart representation for the following reasons: Visual Comparison: A stacked bar chart allows for easy visual comparison between the different mission categories, as each bar is divided into segments representing failure, partial failure, and success percentages. This visual representation enables readers to quickly grasp the relative proportions of each outcome for various mission types. Highlighting Trends: Stacked bar charts can highlight trends over time or across different mission types. For example, readers can see at a glance whether certain mission categories have a higher proportion of success or failure rates compared to others. Clarity of Categories: In Table 2, the data is presented in rows, which might lead to confusion when comparing different mission types. With a stacked bar chart, each mission type can be represented by a separate bar, making it easier to distinguish and comprehend the data. Clear Proportional Relationships: The stacked bar chart's vertical orientation makes it straightforward to understand the proportional relationships between the different outcomes for each mission category. The length of each segment directly corresponds to the percentage it represents. Enhancing Visual Appeal: Stacked bar charts are visually appealing and can capture readers' attention more effectively than plain tabular data. They make the data more engaging and accessible.

With tabular representation as follows:

Another layer of improvisation is as follows:

Here I have represented failures as bar chart in combination with line chart as depicting successes. The different lunar missions are arranged in a specific manner such that the decreasing trend of failures and increasing trend of successes of lunar missions are clearly visible and self-explanatory.

Author’s Chart 3:

Observations: The 1950s had a very high failure rate across mission types. All the orbiter missions failed. The overall failure rate was 84.6%. The 1960s saw the highest number of moon missions (74) for any decade thus far. But the overall failure rate remained relatively high at 62.2%. However, note that in the 1960s, the orbiter failure rate reduced to just 40%, hinting at spaceflight agencies getting better at such missions. Again, the Authors have used a table to represent the percentages of failures and total missions. Here are a few problems with the tabular kind of representation that the Authors have used: Insufficient Differentiation: The table includes three categories: failure, partial failure, and success, but it does not explain the difference between "failure" and "partial failure." Without clear definitions, it might be confusing for readers to interpret the data accurately. Limited Comparisons: The table does not allow for easy comparison between different mission types or decades, as the data is presented in separate rows. Comparing the success rates across different categories might be more intuitive if the data were presented side by side. Lack of Visual Aids: The data is presented in a tabular format, which can be less visually engaging compared to charts or graphs. Visual aids like bar charts or pie charts can help readers quickly grasp the distribution of success and failure rates. So here is how I have improvised: I have used vertical bar chart for better comparisons, visual encoding and visual appeal.

I have added another layer of improvisation by clearly showing the pattern and relation between the number of lunar missions vs the % of failures. Here is why I have used an Area chart for improvisation: An area chart can be used to represent the data in Table 3 to show the total number of lunar missions completed by decade and to visualize the trend over time. Here's how an area chart can be helpful in representing the data: Time-based Visualization: Area charts are ideal for displaying data over time, making them suitable for representing the total number of lunar missions completed by decade. Each data point corresponds to a specific decade, and the area chart connects these data points, allowing readers to observe the trend and changes in the number of missions over time. Highlighting Trends: By using an area chart, trends in lunar missions over the decades can be easily identified. Readers can quickly observe whether the number of missions has been increasing, decreasing, or fluctuating over time. Emphasizing Accumulation: An area chart represents data points as areas, and the cumulative effect of these areas emphasizes the overall growth or decline of lunar missions. This visual effect is particularly useful when analysing the total number of missions completed in each decade. Comparing Across Decades: With an area chart, readers can compare the total number of missions completed in different decades more intuitively than with a table. The visual representation helps in identifying which decades had higher or lower numbers of lunar missions. Adding Context: An area chart can provide a better context for understanding the overall progress of lunar exploration. It complements the data in Table 3 by illustrating the changes in the number of missions completed in a continuous and smooth manner.

Author’s Chart 4:

Here is how I have improvised the Author’s representation: I have used a pie chart for the following reasons: Proportional Comparison: A pie chart is excellent for showing proportional comparisons, making it easy to understand the distribution of lunar missions across different decades. Each slice of the pie represents a decade, and its size directly corresponds to the proportion of missions completed in that period. Visualizing Percentage Share: The pie chart allows readers to visualize the percentage share of lunar missions for each decade relative to the total number of missions completed. It provides a clear and intuitive depiction of the significance of each decade in terms of the overall lunar exploration efforts. Highlighting Dominant Decades: The pie chart readily highlights which decades had a larger number of missions by having larger slices. This visual emphasis can help readers quickly identify the most active or pivotal periods in lunar exploration. Comparison of Relative Sizes: By comparing the sizes of different slices, readers can easily observe the relative difference between the number of missions in various decades. This is particularly useful for identifying trends or shifts in lunar missions over time. Concise Representation: A pie chart provides a concise representation of the data and is well-suited for displaying a small set of categories, making it ideal for summarizing the number of lunar missions completed in a handful of decades.

FURTHER EXPLORATIONS:

I separately explored the number of failures, partial failures, successes and expected probability to succeed within a single representation, also in an exploratory manner as follows:

Link for the above interactive visualization is as follows: https://infogram.com/column-chart-1ho16voe7e8o84n

I have also done another exploratory visualization- This is a like a dashboard where each graph is uniquely representing data of various years. I have also shown the turning points of various missions over the years. The visualization looks as follows:

Link to interactive visualization: https://infogram.com/lunar-launch-mission-1h8n6m3dmdxoz4x?live

THANK YOU!

ghost commented 1 year ago

International impact of India's rice export ban

Name: Prateek Ganguli Roll: 21f1004044

Original (July 28, 2023): The Hindu Data | Who does India’s rice export ban impact the most? Archive.org (July 29, 2023), Archive.is (July 30, 2023

Intent of Original Article

The author attempts to illustrate why and how Non-Resident Indians (NRIs), primarily living in neighboring countries, have been more affected by the rice export ban compared to those living in the U.S.

The articles thus tries to shed light on the fact that despite the reporting bias from other outlets that have highlighted the extent of the ban's impact on the U.S., it is NRIs in neighbouring countries like Nepal and Bangladesh, as well as even some African countries that have been more impacted by the ban than the U.S.

The article also tries to illustrate why the Indian government is banning such seemingly profitable exports, citing lack of farming land area for growing just rice.

Analysis of Original Visualizations

Data Used:
1. Quantity of various kinds of rice exported by India per year.
2. Average quantity of non-basmati white rice bought by the top 50 importer per year.
3. Average quantity of semi/wholly milled rice (all types) bought per year.
4. Retail price of rice for select Indian cities.
5. Farming area covered under rice for select Indian cities.
Type of Data: All data used and visualized comprises historical / timeline kind of numeric data, ordered chronologically and grouped by Countries or Cities as per applicable.
Extent of the Data: The data capture range falls within Financial Years 2018 and 2023.
Dimensions of the Data: The data is multidimensional, with one axis capturing time (mostly in years) and the other axis capturing the country or city group the the data is for.
Gaps in the Data: Data labels for some countries is simply missing in charts 2 and 3, possibly due to lack of space in the chosen visualization. The exact numerical data from the Commerce Ministry, Agriculture Ministry, COMTRADE, Department of Consumer Affairs is also not linked to anywhere in the article.
Essential and Irrelevant Data: Chart 2, while being more numerically informative than chart 3 (due to being able to more easily compare bars than bubbles) is completely irrelevant given that chart 3 is essentially the same data as chart 2, but with basmati included.
Encoding of the Data: The data is represented as line, bar or bubble charts in charts 1, 2 and 3. Charts 4 and 5 are plain tables with numeric data.

Critique of Original Visualizations

Chart 1: While all the legends and units for the axis are described in words above the chart, the chart itself contains no labels or units, making it difficult for the chart to stand on its own.
Chart 2: Grouping of countries by color is seemingly random (brick red denotes West Asia and North Africa; while geographically close, why club them together and not segregate by continent?)
Chart 3: Chart 3 makes chart 2 redundant by being a superset of the data visualized. Chart 3's use of bubbles is also far less legible than chart 2.
Chart 4: While the heatmap table is perfect on its own, it suffers from the same lack of units and labels that chart 1 suffers from.
Chart 5: No legend for the color scale is provided, although the color gradation is fairly intuitive on its own. Not all columns seem to be subjected to this coloring however.

My own Visual Re-telling of the Story

Chart 1: Quantity of the three different types of semi/wholly milled rice exported by India (in tonnes) over time.
Chart 2 & 3: Average quantity of rice (all types) bought by the top importers per year between FY19 and FY23.

Rice Import from India Link

Chart 4: Retail price of rice (₹ per kg) for select Indian cities.
Chart 5: Average farming area covered under rice (FY18 to FY23) compared to the actual area covered in FY24.

iSarthakGautam commented 1 year ago

Sarthak Gautam
21f1000864

Data | Who does India’s rice export ban impact the most?

Article link

Date Published: July 28, 2023 07:43 pm | Updated July 29, 2023 07:16 pm IST
Author: JASMIN NIHALANI

Original intent of the story:

The story is trying to explore the impact of India's decision to ban the export of non-basmati white rice on various countries. The story highlights the panic buying of rice by Non-Resident Indian in the United States and the countries that are most affected by the ban. It also touches upon the import dependency of different countries on India for rice.

Data set used:

The raw dataset used for visualisation isn't explicitly mentioned/provided link to. Sources of data is vaguely referred to Commerce Ministry, Agriculture Ministry, COMTRADE, Department of Consumer Affairs. So based on visualisations and as mentioned in article, data used in the story includes:

Rice Export Data: It includes the quantities of different types of rice, such as basmati, non-basmati and Parboiled rice, exported by India in the last 8 fiscal years
Data about non-basmati white rice bought by different countries from India in FY23.
Average quantity of semi/wholly milled rice (all the three types together) bought per year between FY19 and FY23.

Details of the Data:

Type of Data:

Quantitative data on rice exports and imports.

Extent of the Data:

The data covers the last eight fiscal years (FY23) for exports and FY19 to FY23 for imports of different countries from india.

Dimensions of the Data:

The data includes quantities of different types of rice exported, countries importing non-basmati white rice, and the average quantity of semi/wholly milled rice imported by various countries.

Gaps in the Data:

Duration and Flexibility of the Export Ban: The story mentions that the Indian government decided to ban the export of non-basmati white rice to ensure adequate availability in the domestic market and control rising prices. However, the duration of the ban and whether it is a temporary measure or a permanent one is not specified. Understanding the timeframe of the ban is essential to assess its impact accurately.
Percentage share of different rice type: Data fails to mentions the percentage share in each fiscal year for different types
No Geographic Representation: The story discusses the impact of the ban on various countries, but there are no geographical representations (e.g., maps) to show the locations of these countries and their import quantities. Redesigning the visualisations to include maps would make the story more geographically engaging.

Essential vs. Irrelevant Data:

The essential data includes the export quantities, countries affected by the ban, and import dependency percentages. The information about panic buying in the U.S. is relevant to understanding the reporting bias but might not be directly related to the ban's impact.

Dataset search

I tried to find dataset on data.gov.in and various other forums but most of them were updated till 2019. So i referred to COMTRADE

Working:

Duration and Flexibility of the Export Ban:

Even though i can't find an explicit details about duration of this ban (not even mentioned in press release ). A foreign news website states similar de-oiled rice export ban is till 30 November News link. This can't be used to state the date of normal rice ban.
The other forecast can be that once paddy crops recover due to heavily loss due to rain, then the ban would be lifted.

Illustration 1 (Percentage share of different rice type)

Author's working

Improvements in the visualisation

While the author is able to show the trend line for different types, it isn't able to show the real percentage share.

While trendline charts show the overall trend, IT may not provide a clear comparison of individual components (rice types in this case) in the data. By transforming the trendline chart into a stacked bar chart, readers can easily compare the export quantities of different types of rice side by side for each year. The stacked bars allow for a more effective visual assessment of the contribution of each rice type to the total exports.

The stacked bar chart offers clarity when interpreting fluctuations in individual rice type exports. Sudden changes in segment height can quickly draw attention to significant shifts in market demand or policy-driven decisions, such as the ban on non-basmati white rice exports.

This can also be done using a grid of doughnut or pie-chart but it won't show overall trend.

Doughnut charts are effective for visualising the composition and distribution of a whole into its parts. In the context of rice exports, each doughnut chart can represent a single year, and the segments within the doughnut would represent the different types of rice exported, providing a clear view of their individual contributions.

Placing each year's doughnut chart side by side in a grid allows for easy year-wise comparison of rice exports. Readers can quickly identify any changes in the distribution of rice types across different years.

Doughnut charts display the percentage distribution of each rice type within a single year, which can help readers understand the relative popularity and market share of different rice varieties for each specific period.

Illustration 2

Author uses radial tree it full fills the purpose but might not show the geographical demographic of country that needs the rice most i.e affected the most.

So a geo visualisation cartograph would be great choice to depict the most affected countries and also there respective demographic

Illustration 3

Again, author used heat categorical heat map which is fine but a cartogram can aid to demographic visualisation.

And states rice prices could be approximated using capital

2018 to 2023

----->

Or using tend lines to show the trend and comparison

Reason for increasing price:

One of the main reason for price rise is climate change-related disasters such as extreme flooding in the north and relatively poor rainfall elsewhere have also impacted rice sowing this year. This is supported by rainfall graph by IMD

This along with various other factor also reduce in actual area usable for paddy growth which is clearly depicted in visual:

Conclusion

Including maps in visualisation helps in understanding the reason why a country requires more rice imports like Nepal being a mountainous region can't have paddy fields so it relies on rice imports that in turn is affected due to india's rice ban the most.

Also the consumption of rice is more in coastal states of India which shows the increase in price due to higher demand (demand supply Principle)

Trend-line visualisation for price gives idea of the price trends and stacked bar chart or pie chart provides a comparative figure for types of exports.

Rest visualisations are good and convey's the story in an efficient way.

Tools Used:

Excel, Flourish

upatil98 commented 1 year ago

Uday Patil

21f1003481

Chandrayaan-3 mission: How tough is it to land on the moon?

Article link

Author: [Vignesh Radhakrishnan , Krithika Ganapathy]

Original intent of the story:

The article aims to highlight the challenges and risks of moon missions while presenting historical data on success rates. It emphasizes that over 40% of moon missions have failed and that robotic lander missions have a failure rate exceeding 60%. Crewed lander missions, particularly NASA's Apollo program, achieved a 100% success rate. The author also mentions ISRO's past setback with Chandrayaan-2 but mentions their upcoming mission, Chandrayaan-3. The article underscores the importance of orbiter missions, which have a relatively lower failure rate. Overall, it aims to inform readers about the complexities and achievements in lunar exploration.

Details of Data

The author uses historical data on lunar missions to tell the story. The data source isn't directly specified. The data is presented in the form of tables and charts, providing information on the success, partial failure, and failure percentages of completed lunar missions undertaken by various space agencies and operators since 1958. The data specifically focuses on the outcomes of different types of lunar missions, such as sample return, robotic lander, crewed lander, orbiter, flyby, and impactor missions.

Type of Data: The data used is quantitative and numerical in nature, as it involves percentages of success, partial failure, and failure rates for different types of lunar missions.
Extent of the Data: The data covers lunar missions undertaken by all space agencies and operators since 1950, spanning several decades of space exploration.
Dimensions of the Data: The data is multidimensional, with each row representing a specific lunar mission, and each column representing different aspects of the mission's outcome (success, partial failure, failure). The data is further categorized based on the type of mission (sample return, robotic lander, etc.).
Gaps in the Data: The success rates provided are cumulative percentages, and the data might not capture the underlying reasons for the successes or failures of individual missions. Additionally, the data does not provide details about the specific missions' objectives, the technological challenges they faced, or the mission's specific outcomes beyond success, partial failure, or failure.
Essential Data: The essential data used by the author includes the success, partial failure, and failure percentages for different types of lunar missions. This data helps to highlight the historical performance and risks associated with each type of mission.
Irrelevant Data: The author does not include any irrelevant data in the article. The provided data is relevant to the article's main objective of discussing lunar mission success rates and the challenges of landing on the moon.

Data Encoding

The data seems to be encoded in a tabular format with rows representing individual lunar missions and columns representing the success, partial failure, and failure percentages for different types of missions.

Potential problems with the data encoding could include:

Limited Context: The tabular format may lack detailed context about each mission, such as the specific objectives, technical challenges faced, or mission outcomes beyond success, partial failure, or failure. This limited context might make it difficult to fully understand the reasons behind the mission's results.
Data Aggregation: The data appears to be aggregated by mission type and percentages, which might obscure variations within each category. This could lead to generalizations and miss insights that might be present at the individual mission level.
Lack of Time Series Analysis: The data is presented in aggregate percentages, but it does not include a time series analysis. This means that trends and improvements over time in lunar missions' success rates might not be readily apparent.

Suggestions

Detailed Mission Profiles: Including detailed profiles for each mission, highlighting objectives, technical challenges, and outcomes, would provide a more comprehensive understanding of the factors contributing to success or failure.
Time Series Analysis: Presenting the data as a time series, showing success rates over decades, would allow readers to observe trends and improvements in lunar missions over time.
Case Studies: Supplementing the quantitative data with case studies of specific missions could provide richer insights into the complexities and achievements of lunar exploration.
Visualizations: Utilizing visualizations such as charts, graphs, or interactive dashboards could make the data more accessible and facilitate better understanding of the patterns and trends.
Contextual Information: Providing contextual information about the broader lunar exploration landscape, technological advancements, and challenges faced by space agencies would enhance the reader's understanding of the data.

Redesigned charts

Lunar missions lunar by year

sejalanandIITM commented 1 year ago

The Evolution of the Scripps Spelling Bee

Link: https://www.thehindu.com/data/data-the-evolution-of-the-scripps-spelling-bee/article66947879.ece

What is the story the author is trying to tell?

The author is trying to tell the story of how the Scripps National Spelling Bee has evolved over the years, particularly regarding the difficulty level of winning words and their frequency of usage in published books. The story highlights the trend of winning words becoming increasingly obscure and the changes in word usage frequencies over time.

Data used by the author

The author is using data from the Google Books Ngram Viewer's frequency ratings to tell the story. The Ngram Viewer provides information on the frequency of words found in a vast corpus of 5.2 million books published between 1800 and 2019. This data helps the author track how often the winning words of various editions of the Scripps Spelling Bee were used in books during specific periods.

The type of data

The type of data used in the story is textual data (nominal data). The textual data includes words and their respective frequencies within the corpus.

The extent of the data

The extent of the data used in the story is limited to the Google Books corpus, which contains books published between 1800 and 2019. For each winning word of the Scripps Spelling Bee, the author considers a 20-year period surrounding the contest year. This period includes 10 years before the contest year and 10 years after it.

The dimensions of the data

The dimensions of the data used in the story include:

Contest Year: This dimension represents the year in which each edition of the Scripps Spelling Bee took place. Each contest year is associated with a specific winning word.
Winning Word: This dimension contains the actual words that were declared as winners in each edition of the Scripps Spelling Bee.
Frequency Percentage: This dimension represents the frequency of each winning word in the Google Books corpus during a 20-year period surrounding the contest year. The frequency percentage indicates how often the word appeared in books published during that time frame.
Word Length: This dimension refers to the length of the winning words in terms of the number of letters they contain.

Gaps in the data

Limited to Written Language: The data is sourced from the Google Books corpus, which primarily consists of written text from books. This means that spoken language, internet sources, social media, and other forms of communication are not included in the dataset.
Time Constraints: The data from the Google Books corpus covers books published up until the year 2019. As a result, any developments or changes in language usage or vocabulary that occurred after 2019 are not captured in the analysis.

Essential Data:

Contest Year: It forms the basis for tracking the evolution of winning words over different editions of the Spelling Bee.
Winning Words: The actual words that were declared winners each year, showcasing the trend of increasing word obscurity.
Frequency Percentage: Reveals how common or rare winning words were in written literature during specific periods, supporting the analysis of word difficulty trends.
Word Length: Provides additional context for understanding the difficulty level of each winning word.

Irrelevant Data:

Specific Shortest and Longest Words are an interesting observation, but not crucial to understanding word difficulty and frequency patterns.

Author Charts

Content Frequency of winning words

5-year average frequency of winning words

Frequency of some common words

20 winning words with the least frequency

Top 20 most frequent and least 20 frequent spelling bee winning words

Word length of winning words

Data Encoding

The data is primarily encoded in natural language text. In terms of visual encodings, bar charts and line charts are used and the story is presented in a narrative form. Colour encoding is done in the infographic which shows the top 20 and least 20 frequent winning words.

Problems with visual encodings

The author wants to put the tiny numbers of the average frequency of winning words in context, by looking at the frequencies of some of the most common words as recorded in 2019 (averaging 10 years before and 10 years after) with the following visualization:

Suggested Improvement: To put these tiny numbers in context by comparing them to the frequency of the most common words, those tiny numbers should also be present in the same graph to show the difference. The following chart shows a comparison in the intended way.

In the following chart, the 20 least frequent winning words among all Spelling Bee winning words are shown. So many zeroes on the x-axis make the chart less readable.

Suggested Improvement: Scientific notation/ exponent power form can be used to make the numbers easy to read at a glance.

Sejal Anand 21f1002620

savindraiitm commented 1 year ago

Savindra Singh Shekhawat 21f1003973

A comparison of India’s growth with other nations

Article link

Date Published: August 15, 2022 10:50 am | Updated August 15, 2022 09:03 pm IST Author: VIGNESH RADHAKRISHNAN, REBECCA ROSE VARGHESE

Intent of Original Article

The author attempts to illustrate how india's growth has happened in comaprison to other countries

The article shows growth of countries based on various parameters such as Population, HDI (Human Developement Index), GDP, Infant mortality rate (IMR), Women in parliament, Net migration, Access to electricity, Indibiduls using the internet, CO2 emissions, Electricity from renewable resources.

Analysis of Original Visualizations

Data Used: Values for each countries based on various parameters such as population, GDP etc. Color - coding of groups such as G7, Indian Subcontinent, Brics, Emerging Economy.
Type of Data: Data is divided into two timestamps it may be around 1960 and then one near latest date for ex. 2021.
Extent of the Data: The data shows values couple of decades ago and latest one.
Dimensions of the Data: The data is multidimensional, with one axis capturing time (mostly in years) and the other axis capturing the country or group the the data is for.
Encoding of the Data: The data is represented as bubble charts and each of them represent a country. For each Index parameter there are two charts one is old data and one is for latest data.

Suggestions

Simplify Charts: The charts above show too much data encoded in it and shows unnecessory information. For example each country is represented as a bubble but it's not easy to read and identify easily which country they represent.

Merge chart (old and latest): The data is represented over time and it would be good if we can include those in a single chart as this can lead to better comparison.

A ratio parameter can be included to furthur improve the understanding of users about this topic.

Redesigned charts

The above changes can be applied over other paramters as well present in the article.

mb1AtGithub commented 1 year ago

Manisha Bapat 21f1000449

Original article: https://www.thehindu.com/data/data-the-risk-of-small-states-heavy-reliance-on-the-union-government/article67095283.ece The risk of small States’ heavy reliance on the Union government:

A) What is the story the author is trying to tell? In India, the total revenue receipts for a State constitute transfers from the Union government such as the State’s share in Union taxes including income tax, corporation tax, and grants, and the State’s own revenues from tax and non-tax sources. The small States (i.e. States with a population of less than 1 crore), have distinctive characteristics that limit revenue mobilization. Recognizing these disabilities, the Constitution has provided mechanisms to address them. The author wants to convey how these States continue to rely heavily on the Union government for revenue. This dependence creates vulnerabilities for the States as well as the Union.

B) What data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.

The author uses simple line charts that show major revenue receiving Small states along with combined all states to show the contrast between the revenue receipts.
The author explains how some states have consistently been taking ~90% revenue receipt for their total revenue. *
Also, in the second half of the story, the author explains the 3 major risks involved for the states in relying on the center’s funds, and how it possibly depicts the States’ inability to mobilize taxes to show significant improvement.
The author further suggests certain possible measures to improve the tax administration in the States. Not only will this lead to higher resource mobilization, but it will also reduce the deviation of actual from budgeted tax revenues. *
The type of data is numerical revenue figures, while state names as the categorical variable.
The current visualization shows the trend over 10 years as to how heavily the small states depend on the union government. While it is indeed insightful, we also know the fact that, help from the union government to the small states is provided by the Constitution, as it was and is already established, that, these small states have some limitations. So, even a 3 year data is enough to see if there has been any improvement. Also, it makes sense to show the current year data so that some action can be taken immediately. Hence, in addition to the current visualizations provided by the author, I have also included visualizations only for current year.

C) How is it encoded, what problems are with it, and how have you attempted to improve it? Good points:

The categorical variables are encoded by using colors on the different line charts.
Line Chart 1(Center’s contribution to TR : TR ratio) well represents the trend for 10 years.
Line Chart 2 ( OTR:GSDP): also shows the 10 year trend.

Possible Improvements:

Both line charts can have dots that show year so the user does not have to move along the Y direction to locate that year on the line of a specific state.

My additional visualizations: While the existing charts are good enough, certain points explained in the story do not have supporting visualizations or need to be inferred from the ones gives. Hence, I have added more visualizations to support them.

Chart 1:

Chart2:

Chart 3: Current transfers to the revenue receipts ratio(in %) for states: Supports point B)2). Shows how some states have Current transfers to the revenue receipts ratio(in %) high. Also, we see that Goa, Himachal Pradesh and Sikkim have slightly improved (reduced %) compared to last 2 years, while all other small states have either same or more %

Chart 4: Center’s revenue contribution To the total revenue receipts ratio (%) . Supports B)2) for current year, as compared to all other states’s ratio ( ~43%)

Chart 5:
Tax revenue vs non tax revenue generated by the small states. Supports B)4) for current year, where one can see potential to improve either tax revenue or non-tax revenue or both.

faizanxmulla commented 1 year ago

India’s staggering wealth gap.

Name : Faizan Mulla Roll No. : 21f1003885

Article Link : India's Staggering Wealth Gap in Five Charts

Data : Credit Suisse’s Global Wealth Databook 2014

Author : Rukmini S

SECTION 1 : Story the Author is trying to tell / Intent of original story

The Author provides the following visualizations to support her claim :

Horizontal stacked bar chart showing the share of wealth held by different classes (poor, middle, rich) in various regions or countries.
Line graph presenting the wealth share of India's top 10% over time.
Line graph shows the wealth share of the top 1% in various countries or regions over time.
Stacked bar chart depicting the share of various countries or regions in the global population of the poor, middle class, and rich.
Pie chart illustrating the distribution of the global top 1% of wealth holders by region.

Analysis and Critique of each visualization :

Graph 1 :

Analysis: This stacked bar chart represents the proportion of wealth held by different classes (poor, middle, and rich) in various regions and countries in 2023. Each bar represents a region or country and is divided into sections corresponding to the different classes.
Critique: While this graph allows for a good comparison of wealth distribution across different regions and countries, it again lacks a temporal dimension, preventing us from observing any trends or changes over time. The precise numerical values or percentages for each class are also missing. Furthermore, there's no clear definition of what constitutes 'poor', 'middle', or 'rich' wealth classes. Providing these details could enhance the interpretability of the data.

Graph 2 :

Analysis: This line graph depicts the percentage of total wealth owned by the top 10% of India's population from 2000 to 2020. The x-axis represents time in years, and the y-axis represents the percentage of wealth share.
Critique: The graph provides a clear depiction of the wealth share trend of India's top 10% over two decades. However, it would be beneficial to have context or correlating factors to understand the causes behind these trends. Including data on major economic policies, GDP growth, and income inequality during these years could enrich the analysis.

Graph 3 :

Analysis: This line graph displays the percentage of total wealth owned by the top 1% of the population in different countries or regions from 2000 to 2020. The x-axis represents time in years, and the y-axis represents the percentage of wealth share.
Critique: While this graph provides a useful comparison of wealth share trends across different countries, understanding the reasons behind these disparities or similarities is challenging without additional context. Incorporating data on the economic and political context in these countries, such as GDP growth, income inequality, and taxation policies during these years, could provide a more comprehensive view.

Graph 4 :

Analysis: This stacked bar chart displays the percentage of the global poor, middle, and rich populations that belong to different regions and countries in 2023. Each bar represents a region or country, and is divided into sections that correspond to the poor, middle, and rich classes.
- Critique: While the graph gives a clear snapshot of the wealth distribution in 2023, it lacks temporal data, making it impossible to observe trends or changes over time. Additionally, the chart doesn't provide precise percentages or numerical values for each class, making it difficult to obtain exact figures. Furthermore, there's no clear definition of the wealth boundaries used to categorize individuals into 'poor', 'middle', and 'rich' classes. It would be beneficial to include this information for better understanding and interpretation of the data.

Graph 5 :

Analysis: This pie chart represents the proportion of the world's top 1% wealthiest individuals from different regions or countries.
Critique: The pie chart provides a clear picture of the regional distribution of the world's wealthiest individuals. However, it doesn't provide information on how this distribution has changed over time. The exact percentages or numbers of individuals in each region are also missing. Furthermore, additional data on how wealth is distributed within this top 1% in each region could offer a deeper understanding of wealth inequality.

SECTION 2 : Dataset used and its description in detail

(answer to the following question : What data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.)

There are 5 tables that contain the data. They are described in further depth as follows:

Table 4-1: This table shows the estimated wealth distribution by region and for selected countries in 2014.
Table 4-2: This table provides the shares of total wealth held by the top 1% and top 10% of adults in 2000, 2007, and 2014 for various regions and countries.
Table 4-3: This table presents the change in wealth shares of the top 1% and top 10% between different periods (2000-2007, 2007-2014, and 2000-2014) for various regions and countries.
Table 4-4: This table gives an overview of the Gini coefficient, a measure of inequality, for various regions and countries in 2000, 2007, and 2014.
Table 4-5: This table provides a detailed breakdown of the changes in wealth shares of the top 1% and top 10% for various countries and regions between 2000 and 2014, categorized by the speed of change.

Table 4-1 :

This table presents cross-sectional data on wealth inequality across multiple countries. The data is classified as follows:

Country : Categorical data representing various countries across the world.
Wealth Gini : Continuous data representing the Gini coefficient, a measure of inequality (with values between 0 and 1, where 0 represents perfect equality and 1 represents perfect inequality).
Top 10% wealth share: Continuous data representing the percentage of total wealth owned by the top 10% of the population.
Top 1% wealth share: Continuous data representing the percentage of total wealth owned by the top 1% of the population.

The data spans a wide range of countries but it seems to be limited to a single time period, as there is no temporal dimension included. Therefore, the data cannot be used to analyze trends over time.

The essential data in this table are the measures of inequality, i.e., the Gini coefficient and the top 10% and top 1% wealth shares, as they directly pertain to the topic of wealth inequality. The country names are also essential as they provide context for the inequality measures.

As for the irrelevant data, without more information about the context of the analysis, it's difficult to say definitively what's irrelevant. However, if the focus is solely on wealth inequality, details such as population size, GDP, or other economic indicators might be seen as less directly relevant.

Table 4-1

Table 4-2 :

This table presents longitudinal data on wealth inequality across various global regions for different periods. The data is classified as follows:

Region: Categorical data representing various geographical regions across the world.
Period: Categorical data representing different time periods (2000-2007, 2007-2014, and 2000-2014).
Top 10% wealth share change: Continuous data representing the change in the percentage of total wealth owned by the top 10% of the population across the specified time period.
Top 1% wealth share change: Continuous data representing the change in the percentage of total wealth owned by the top 1% of the population across the specified time period.

This data provides a regional and temporal view of changes in wealth inequality. However, there might be gaps in the data, as the change in wealth share is not broken down by individual countries within each region.

The essential data in this table are the changes in wealth share for the top 10% and top 1%, as well as the periods and regions, as these help track the temporal and regional trends in wealth inequality.

Without more context, it's hard to identify definitively irrelevant data. However, additional economic or demographic data that don't directly pertain to wealth inequality might be less relevant in this analysis.

Table 4-2

Table 4-3 :

This table presents longitudinal data on the wealth share of the top 1% in various countries across different time periods. The data is classified as follows:

Country: Categorical data representing various countries across the world.
Year: Categorical data representing different years from 2000 to 2014.
Wealth share of top 1%: Continuous data representing the percentage of total wealth owned by the top 1% of the population for each year.

This data allows for analysis of the change in the wealth share of the top 1% over time in different countries. The table does not cover all countries, so there are gaps in the data.

The essential data are the wealth share of the top 1%, the countries, and the years, as these data points allow for tracking changes in wealth inequality over time in different countries.

Again, without more context, it's hard to definitively identify irrelevant data. However, additional economic or demographic data that don't directly pertain to wealth inequality might be less relevant in this analysis.

Table 4-3

Table 4-4 :

This table is very similar to Table 4-3, but it presents data on the wealth share of the top 10% instead of the top 1%. The data is classified in the same way.

Table 4-4

Table 4-5 :

This table presents longitudinal data on changes in wealth inequality in various countries over different periods. The data is classified as follows:

Country: Categorical data representing various countries across the world.
Period: Categorical data representing different time periods (2000-2007, 2007-2014, and 2000-2014).
Change in wealth share of top 10%: Categorical data representing the qualitative change (e.g., "rise," "rapid rise," "flat") in the percentage of total wealth owned by the top 10% of the population across the specified time period.
Change in wealth share of top 1%: Categorical data representing the qualitative change (e.g., "rise," "rapid rise," "flat") in the percentage of total wealth owned by the top 1% of the population across the specified time period.

As for irrelevant data, given that the focus is on wealth inequality trends over time, any data that doesn't directly relate to the changes in wealth share for the top 10% and top 1% might be considered less relevant.

Table 4-5

These tables provide a comprehensive overview of wealth inequality trends over time, across different countries and regions. The data seems to be extensive and carefully selected to focus on the wealth shares of the top 1% and top 10%, which are common measures of wealth inequality. The potential gaps in the data might include specific breakdowns within each country or region, or other demographic or economic factors that could influence wealth inequality. However, these gaps likely don't detract from the main story the data is telling about global wealth inequality trends.

21f1004666 commented 1 year ago

The gender gap in clinical trials and disease funding

Name: Andiboyina Mourya Chakradhar Nagesh

Roll no.: 21f1004666

Article Link: Here

Authors: The Hindu Data Team

Note: Since this is a premium article, I used Internet Archive (Wayback machine) to access this article.

Original Intent of the article:

This article aims to highlight the disparity in gender during healthcare research like clinical trials and funding the research to prevent/cure the diseases. It shows how female population are more affected by their dominant diseases/causes compared to male population by dominant diseases/causes by using a unit called Disability-Adjusted Life Year (DALY) which quantifies the burden of the disease. But when it comes to funding in research, it show that diseases dominated in male population receive more funding compared to female dominant diseases. The article also explains that the lower participation of female population in clinical trials despite having high DALY

Understanding the charts

One of main keypoints that is shown in the charts are DALY values. DALY also known as Disability-Adjusted Life Year is a measure of disease burden on a person which quatifies through number of years lost due to ill-health, disability or early death. It is calculated as the sum of the years lost due to premature mortality in population and years lost due to disability of people living in health condition. Higher the value of DALY, the larger the affect on the gender.

Data

Since the article doesn't mention the data it used but mentions the names of the sources it used, I found the data the article uses in a research paper (link). For the charts 1 and 2 it uses the data found in table 1-4 combined. The attributes of the data are Diseases, Dominant-type, DALY values, Amount Spent. Since the data is combined the overall number of items in the data are 42.

Chart 1

Original Visualization

Criticisms

While the chart itself is well represented with color encoding, it has few issues namely:

It doesn't mention the scaling of the bubbles in the chart.
A lot of bubbles in the chart have missing lables.
Missing title of the chart to give context to the reader.

Few of these issues could've been avoided by providing an interactive chart (which can give context by providing values when hovering over the chart )rather than a static image.

Improved visualization

Interactive Link

Chart-2

Original Visualization

Criticisms

The main issue of this graph is that a lot of points are clustered near the bottom left of the graph.
This makes it harder for the reader to identify the markers other than those that appear as outliers.
It also doesn't provide the information that male-dominant diseases with similar DALY receive higher funding than the female-dominant diseases.
Missing title in the chart
Missing legend for the color encoding in the chart.

Improved visualization

chart_2

Chart-3

Criticism

Chart-3 is an interactive chart that provides values when hovering over the markers. However, the main criticism of this graph is the scaling of the x-axis. Since the x-axis scale starts with 40, the graph kind of misleads the reader by making it look like there is a huge difference between the markers.

Chart-4

Criticism

Similar to chart-3, chart 4 is an interactive chart. But it also suffers from the same scaling issue as chart-3 and thereby potentially misleading the reader. Another issue in chart-4 is visual encoding. The markers at the cancer cannot be visually distinguished, which might be an error from the author given that there the text in the legend is almost the same.

NOTE: Due to the lack of data for chart-3 and 4, I wasn't able to create improved visualizations.

S-D-P commented 1 year ago

Name: Siddhi Dhirajkumar Pandirkar RollNo: 21f1001177

Article: Data | Antarctic sea ice cover hits record lows on many days of 2023

Understanding the Original Story: The story aims to highlight the concerning trend of decreasing Antarctic sea ice cover in 2023. It mentions that on February 19, 2023, the sea ice extent reached the lowest level ever recorded. The decline in sea ice extent is an ongoing trend, and in the last six years, Antarctic sea ice cover has witnessed significant declines. This reduction in ice has wide-ranging impacts, including rising global sea levels, altered water flow patterns, effects on weather patterns and ecosystems, and the disruption of the Antarctic food chain.

Data Used in the Original Story The data used in the original story is Sea Ice Index of the National Snow and Ice Data Center and the National Oceanic and Atmospheric Administration The Sea Ice Index provides a quick look at Arctic- and Antarctic-wide changes in sea ice. It is a source for consistent, up-to-date sea ice extent and concentration images, in PNG format, and data values, in GeoTIFF and ASCII text files, from November 1978 to the present. Sea Ice Index images also depict trends and anomalies in ice cover calculated using a 30-year reference period of 1981 through 2010.

Parameter(s): ICE EXTENT ICE GROWTH/MELT SEA ICE CONCENTRATION
Platform(s): DMSP, DMSP 5D-3/F17, DMSP 5D-3/F18, Nimbus-7
Sensor(s): SMMR, SSM/I, SSMIS
Data Format(s): PNG, GeoTIFF, CSV, Shapefile
Temporal Coverage: 26 October 1978 to present
Temporal Resolution: 1 day
Spatial Resolution: 25 km x 25 km
Spatial Reference System(s): NSIDC Sea Ice Polar Stereographic North EPSG:3411 NSIDC Sea Ice Polar Stereographic South EPSG:3412
Spatial Coverage: N: -39.23 S: -90 E: 180 W: -180 N: 90 S: 30.98 E: 180 W: -180

Identifying Gaps and Relevance of Data: The data from the Sea Ice Index is relevant and comprehensive for the analysis, covering sea ice extent and concentration images, as well as data values from 1978 to the present. The index also provides trends and anomalies in ice cover, which adds to the understanding of the long-term changes in Antarctic sea ice. Therefore, I feel like there are no apparent gaps, and the data appears to be relevant and comprehensive for telling the story.

Visualizations in the Original Story:

The visual encodings in the first two charts are a bit messy. Overall, I feel that the charts are correct and to-the-point but while the visualizations used in the original story are clear and informative, they lack the ability to evoke a stronger emotional reaction or grab the reader's attention, considering the seriousness of the topic. Visual enhancements can be made to create a more attention-grabbing and impactful data story.

Redesigning the Data Story: In the redesign process, the focus was on enhancing data encodings to ensure improved clarity in the visualizations. Impactful colors and annotations were introduced to the charts, effectively highlighting key trends and the record-low sea ice extent on February 19, 2023. To create a compelling narrative, a very simple infographic was made, emphasizing the alarming implications of the drastic reduction in Antarctic sea ice cover. The data story now effectively communicates the urgency of addressing climate change to protect polar environments and mitigate the impact on global sea levels and weather patterns.

These enhancements in the data visualization and storytelling allow readers to grasp the significance of the declining Antarctic sea ice cover and it is more eye grabbing now than the original data story.

Tool - Canva & NSIDC Data Visualizer

SrijanShukla commented 1 year ago

Srijan Shukla 21f1000671

Article: How much employment generation does the economy need?

Brief: The author is trying to debunk the government's claim that there is not much unemployment in India as adequate employment is being generated. The crux of the argument lies in the differences in the estimates for the annual employment generation requirement. The author challenges the government's assumption that India needs to create 5-8 million jobs per year. Instead, he argues that this figure barely scratches the surface of the actual problem, due to the increasing number of young people entering the workforce every year and the vast number of unorganized sector workers needing proper work.

Story: The author is attempting to illustrate the potential influx of individuals into the labor force in the year 2022, based on the year they were born and the percentage who survived. It demonstrates how different age groups (corresponding to different years of birth) will contribute differently to the labor force.

Data used: The data consists of numerical demographic data, specifically the count of individuals who are predicted to join the labor force in 2022 from birth cohorts of 2000, 2002, and 2007. The data also includes a percentage associated with each year, presumably indicating a survival rate or an approximation of who is likely to join the labor force from each birth year cohort. The total population potential for each cohort is derived from these percentages.

Type of data: The data is quantitative, consisting of counts of individuals and percentages.

Extent of the data: The data covers three different birth cohorts (years 2000, 2002, and 2007) and predictions for their potential entry into the labor force in 2022.

Dimensions of the data: The data has two dimensions: the birth year (categorical variable) and the count of potential labor force entrants (numerical variable).

Gaps in the data: The data does not provide information about the methods used to calculate these potential labor force entries or how the percentages were determined. It also lacks demographic details like gender, geographical location, education level, etc. Moreover, it's a one-time snapshot and doesn't provide data for other years.

Essential data: The essential data is the birth year, the percentage of survivors likely to join the labor force, and the resulting count. This data allows the author to make the desired prediction.

Irrelevant data: Without additional context, it's hard to determine if there's irrelevant data.

Data Encoding and Improvement:

Encoding: The data is encoded in percentages and raw counts. The percentages indicate the proportion of each birth year cohort that is likely to enter the labor force, and the raw counts represent the total potential entries into the labor force from each cohort. The data is also presented in a tabular format, making it easy to compare the potential contribution of each birth cohort to the labor force.

Problems: The data's tabular presentation lacks a visual component, which might help readers better understand the contributions of each birth cohort. It's also unclear how the percentage and raw counts were calculated, leading to potential interpretation issues.

Also, Table 2 is just a list of conclusions and the data has not been used to properly arrive at those conclusions.

Improvements: To improve this, I have suggested creating pie chart and bar plots to visualize the data. This makes it easier to understand the contribution of each birth cohort to the potential labor force. For a complete picture, it would be beneficial to have data from multiple years to understand the trend over time. -- A stacked area plot would be a great tool to visualize this if such data is available.

In this visualization, each slice of the pie represents a different age group. The size of each slice corresponds to the size of the potential labour force entries from that group. The percentages shown in the pie chart represent the proportion of total potential entries that each group contributes.

also,

A stacked area plot can help visualize the cumulative contribution of each age group to the potential labor force over the years.

Thank you.

dhruvsanan commented 1 year ago

Dhruv Sanan 2023716

E-rickshaws to two-wheelers: The shift in the share of electric vehicles

https://www.thehindu.com/data/data-e-rickshaws-to-two-wheelers-the-shift-in-the-share-of-electric-vehicles/article67090337.ece

Original intent of the story:

The author of the article is trying to tell the story of the shift in the mix of electric vehicles in India from e-rickshaws to two-wheelers. The article states that there has been a dramatic shift in the class of electric vehicles over time. In the initial years, between FY2015 and FY2020, when the number of electric vehicles was growing at a relatively slow pace, the share of e-rickshaws in the mix was much higher than the share of electric bikes. However, in the last four fiscal years, as the absolute number of electric vehicles has increased at a quicker pace, the share of electric bikes has surged and surpassed the share of e-rickshaws . The author concludes by saying that India has a huge potential to become a global leader in the electric vehicle market, given its large population, growing urbanization, rising income levels, and environmental concerns. However, to achieve this goal, India needs to overcome the existing challenges and create a conducive ecosystem for electric mobility

The author is using various types of data to tell the story of the electric vehicle market in India. Some of the details of the data are:

The author uses quantitative data such as numbers, percentages, and ratios to show the trends and patterns in the number, share, and distribution of electric vehicles across India. For example, the author uses data from the Ministry of Road Transport and Highways to show that the number of electric vehicles in India has surged from just 2,400 a decade ago to over 27.4 lakh as of July 2023. The author also uses data from the Society of Manufacturers of Electric Vehicles to show that there has been a dramatic shift in the class of electric vehicles over time, from e-rickshaws to two-wheelers.

The extent of the data

The author uses geospatial data such as maps and charts to show the inter-State disparity in the penetration of electric vehicles. For example, the author uses a map from The Hindu Data Team to show that Assam ranked first with a share of 2.2% electric vehicles, followed by Tripura (about 2%). Delhi, Bihar, Uttarakhand, Uttar Pradesh, and Goa had a share of over 1%. Among the major States, Himachal Pradesh had the lowest share with 0.11%, followed by Punjab (0.26%), Andhra Pradesh (0.40%), West Bengal (0.44%), and Madhya Pradesh (0.47%). The author uses comparative data such as tables and graphs to compare the electric vehicle market in India with other countries and regions. For example, the author uses a table from The International Energy Agency to show that China had over 4.5 crore electric vehicles as of 2020, accounting for about 5% of its total vehicle fleet. The European Union had over 1 crore electric vehicles as of 2020, with a share of about 3%. The United States had over 18 lakh electric vehicles as of 2020, with a share of about 1%.

The dimensions of the data

The author uses qualitative data such as opinions, perspectives, and insights from various sources to discuss the challenges and opportunities for the growth of the electric vehicle market in India. For example, the author quotes Rajiv Kumar, Vice-Chairman of NITI Aayog, who said that India has a huge potential to become a global leader in the electric vehicle market. The author also cites Sohinder Gill, Director General of Society of Manufacturers of Electric Vehicles, who said that lack of charging infrastructure, high upfront costs, low consumer awareness, policy uncertainty, and technical issues are some of the major barriers that hinder the growth of the electric vehicle market in India.

Some of the gaps in the data that the author uses are:

The data on the number and share of electric vehicles in India is based on registration records, which may not reflect the actual usage and performance of electric vehicles on road. There may be cases where electric vehicles are registered but not used regularly or efficiently due to various reasons.
The data on the inter-State disparity in the penetration of electric vehicles does not account for the differences in population size, density, income levels, road conditions, and other factors that may affect the demand and supply of electric vehicles across States. There may be cases where States with lower share of electric vehicles have higher absolute number or density of electric vehicles than States with higher share.
The data on the comparison of electric vehicle market in India with other countries and regions does not account for the differences in definitions, classifications, standards, and methodologies that may affect the comparability and reliability of data across countries and regions. There may be cases where countries and regions have different criteria for defining what constitutes an electric vehicle or how to measure its share or performance.

Some of the data that is essential for telling the story are:

The data on the number and share of electric vehicles in India over time is essential for showing how the electric vehicle market has evolved and grown in India.
The data on the shift in the class of electric vehicles over time is essential for showing how consumer preferences and choices have changed in India.
The data on the inter-State disparity in the penetration of electric vehicles is essential for showing how different States have adopted and promoted electric mobility in India.
The data on the comparison of electric vehicle market in India with other countries and regions is essential for showing how India fares against other players in the global arena.

Some of the data that is irrelevant for telling the story are:

The data on the number and share of non-electric vehicles in India over time is irrelevant for telling the story of the electric vehicle market in India, as it does not directly relate to the topic or the main points of the story. The data on the details of various schemes and incentives launched by the government to promote electric vehicles in India is irrelevant for telling the story of the electric vehicle market in India, as it does not provide any evidence or analysis of the impact or effectiveness of these schemes and incentives on the growth or performance of electric vehicles in India.

Author Charts

Data Encoding

The data in the article is encoded in various ways, such as tables, charts, maps, and text. The data is mainly sourced from the Ministry of Road Transport and Highways and the Society of Manufacturers of Electric Vehicles. The data covers the number, share, and distribution of electric vehicles in India from 2010 to 2023, as well as the comparison of electric vehicle market in India with other countries and regions.

Some of the problems with the data encoding are:

The tables and charts are not labeled clearly or consistently. For example, the table showing the number of electric vehicles in India does not have a title or a source, while the chart showing the shift in the class of electric vehicles has a title but no source or units. The chart also uses different colors for e-rickshaws and two-wheelers without explaining what they mean.
The maps are not scaled or projected properly. For example, the map showing the inter-State disparity in the penetration of electric vehicles uses a Mercator projection, which distorts the size and shape of States. The map also uses a color gradient to show the share of electric vehicles, but does not provide a legend or a scale to indicate what the colors mean.
The text is not concise or coherent. For example, the article uses long paragraphs and sentences that contain redundant or irrelevant information. The article also jumps from one topic to another without providing clear transitions or connections.

Some of the ways to improve the data encoding are:

Use descriptive and consistent labels and titles for tables and charts. For example, the table showing the number of electric vehicles in India could have a title like “Number of registered electric vehicles in India by fiscal year (in lakhs)” and a source like “Source: Ministry of Road Transport and Highways”. The chart showing the shift in the class of electric vehicles could have units like “(in percentage)” and a legend like “E-rickshaws: blue; Two-wheelers: orange”.
Use appropriate scales and projections for maps. For example, the map showing the inter-State disparity in the penetration of electric vehicles could use an equal-area projection, which preserves the area and proportion of States. The map could also use a legend or a scale to show what the color gradient means, such as “Share of electric vehicles (in percentage)”.
Use short and clear sentences and paragraphs for text. For example, the article could use bullet points or subheadings to organize the main points and topics. The article could also use transitions or connectors to link different paragraphs and sentences, such as “However”, “For instance”, “In contrast”, etc.

My charts: Since the data received was very limited, I could recreate limited number of charts 1) data source Instead of Area Chart, Bar chart is more clear and uses less pixel density

Visualization link

2) data source snapshot-1690743154777

Visualization link

Invincyble commented 1 year ago

Vineeth Reddy Donthi 21f1004514

Article: A third of Central University teaching positions lying vacant

Authors: Maitri Porecha, Vignesh Radhakrishnan

Intent of original story:

The story the author is trying to tell in the article is about the high vacancy rate in teaching positions across Central Universities in India. It highlights the following key points:

Vacant Teaching Positions: Over 30% of teaching positions are lying vacant in 45 Central Universities in India. Out of 18,956 sanctioned teaching positions, around one-third (6,028) were vacant as of February.
State-wise Vacancies: The article presents a state-wise split of vacancies, with Odisha's central universities having the highest vacancy rate of 88%. Jammu & Kashmir U.T. and Tripura are also regions with over half the teaching positions vacant, while Mizoram and Kerala have the lowest vacancy rates at 15% or less.
Caste-wise Data: The article provides data on vacancies based on caste categories, indicating that vacancies are higher in OBC, SC, ST, EWS, and Persons with Disabilities categories compared to the General Category.

Dataset:

The authors made use of a dataset that included information regarding the state's name, the university's name, the number of sanctioned roles, and empty vacancies in both teaching and non-teaching positions at all central universities. There were both categorical and quantitative variables in the data.

Gaps in the Dataset: The authors have confined their visualisations on a broader level of filled and vacant. They could have made use of the various types of positions offered in the universities. In addition to that, they have concentrated on data from the top ten states/central universities and ignored the remainder. The visualisation might have a more thorough degree of detail if it used data from all states and positions.

Author's chart 1:

My visualization:

Improvement:

The chart failed to show the proportion of vacant posts which was the main intent of the story. I chose donut chart to visualise the number of vacant posts compared to total posts.

Author's chart 2:

My visualization:

Improvement:

The data visualised here shows the State-wise split of the share of vacant posts in central universities using a tree map. Though the numbers are being depicted, the sizes of the blocks are misleading. I've used a population pyramid to signify the magnitudes of the number of posts in each category across all the universities.

Author's chart 3:

My visualization:

Improvement:

A similar visualisation has been used which can be better represented using a stacked column chart. This kind of visualisation emphasises on the proportions of vacancies in each university thereby giving us a clear picture.

Author's chart 4:

My visualization:

Improvement:

In this chart, a simple bar chart has been chosen to represent the caste-wise split of the share of vacant posts in Central Universities. Even though bar chart is a good choice to simply visualise the vacancy percentage in different universities, it does not say enough about posts. Using grid of pie charts, I was able to depict both proportion of vacancies and comparison across different universities.

Criticism:

The data could've put to more use instead of filtering top states or confining the hierarchy upto only one level. The choice of charts have constricted the visualisations from telling the actual story.

dipak-patil-iitm commented 1 year ago

Data | Who does India’s rice export ban impact the most?

Dipak Patil 21f1004451

You can visit and read about the Article here

Date Published: July 28, 2023, 07:43 pm | Updated July 29, 2023, 07:16 pm IST

Authors: JASMIN NIHALANI

What is the story the author is trying to tell?

India is the world's largest exporter of rice, and its export ban will have a significant impact on many countries that rely on Indian rice imports. The original story is about the impact of India's rice export ban on Asian and African nations, where rice is a staple food. The ban was implemented to ensure sufficient availability of rice in the Indian market and control rising prices domestically.

The author uses data from the Indian government, the World Bank, and the Food and Agriculture Organization of the United Nations to justify her points. By examining data on rice production, exports, and import patterns, the article aims to reveal the complex interdependence between nations in the global rice trade.

The article concludes by mentioning that the ban on non-basmati white rice exports might bring relief to Indian consumers, particularly in the southern states, where rice prices had been rising due to climate change-related disasters impacting rice sowing. The author addresses the importance of non-basmati white rice in India's export market, pointing out that its export volume surpassed that of basmati rice in the previous two fiscal years. Over 140 countries purchased non-basmati white rice from India, and the ban is expected to have a significant impact on Nepal, Bangladesh, various African countries (Madagascar, Benin, Kenya, Ivory Coast), Asian countries (Malaysia, Vietnam), and the UAE, all of which were major buyers of this type of rice.

Overall, the article provides insights into the consequences of India's rice export ban, with a focus on the reactions of NRIs in the U.S. and the impact on various countries heavily reliant on Indian rice imports.

What data he/she is using to tell the story? Describe its details -- type of data, extent of the data, dimensions of the data, gaps in the data, what data is essential and what is irrelevant.

Dataset Description: The author used a comprehensive dataset on "India's Rice Production, Exports, and Import Patterns" spanning the years 2015 to 2023. For various charts, the author has used different datasets. Datasets like the quantity of the three different types of semi/wholly milled rice exported by India (in tonnes) over time, the average quantity of non-basmati white rice bought by the top 50 importers per year between FY19 and FY23, the retail price of rice (₹ per kg) for major cities between 2018 ad 2023, the usual area covered under rice (FY18 to FY23 average) for the week ending July 13 and July 20 compared to the actual area covered this year (FY24)

Sources mentioned by the author are the Commerce Ministry, Agriculture Ministry, COMTRADE, Department of Consumer Affairs, but the dataset link was not provided in the article, I couldn't find all the data which covers the topics in the article

Type of data: The data is quantitative. It includes the amount of rice that each country imports from India each year, prices of rice

Extent of the Data: The data covers the last eight fiscal years (FY23) for exports and FY19 to FY23 for imports of different countries from India, allowing for the analysis of long-term trends and the impact of India's rice export ban.

Dimensions of the Data: The data used in the article spans multiple dimensions, such as:

Quantities of different types of rice exports (non-basmati white rice, basmati rice, parboiled rice) from India to various countries.
Average quantity of non-basmati white rice bought by the top 50 importers per year.
Average quantity of semi/wholly milled rice (all three types combined) bought by different countries from India per year.
Retail prices of rice (₹ per kg) in select cities.
Usual area covered under rice cultivation for specific weeks and fiscal years.

Gaps in the Data:

Secondary Factors: The data does not cover other factors influencing food prices and food security in importing countries, such as import tax, or transportation costs.
Granularity of Import Data: For some nations, the dataset lacks extensive information on rice imports, thereby leading to less accurate assessments of specific nation-level impacts.
Rice Quality and Varieties: The dataset does not differentiate between different rice varieties and their quality. Different rice varieties have distinct market demands and prices, and changes in their trade could have varying impacts on importing nations.

Essential vs. Irrelevant Data:

The essential data in this context would be the historical trends of India's rice production, exports, and import quantities, as well as the corresponding food price index data for importing countries post the export ban. "NRIs have been resorting to panic buying rice in the U.S. despite the country’s relatively low dependence on imports from India" This data is irrelevant

Visualization of the data

Chart 1

The exported share of non-basmati white rice surpassed the share of basmati rice in the last two fiscal years (Chart 1). In FY23, India exported around 64 lakh tonnes of non-basmati white rice and close to 45 lakh tonnes of basmati rice. The most widely exported type was parboiled rice (78 lakh tonnes). Now, non-basmati white rice, which formed over a quarter of semi/wholly milled rice, has been taken off the market.

Improvements

The Y-axis label could be "Quantity of Rice (in Tonnes)".
The X-axis label could be "Year".
The author could have used a stacked bar chart instead of a line chart. It would have been more readable

Chart 2

Chart 2 shows the average quantity of non-basmati white rice bought by the top 50 importers per year between FY19 and FY23.

Improvements

The chart could be made more informative by including the total quantity of non-basmati white rice imported by each country each year. This would give users a better understanding of the overall impact of the ban on each country.
The author could have used a GeoSpatial graph for visualization, which would have covered all 140 countries instead top 50 only

Chart 3

Chart 3 shows the average quantity of semi/wholly milled rice (all three types together) bought per year between FY19 and FY23. The bigger the circle, the higher the dependency of a country on India for rice.

Improvements

The chart could be made more informative by including the names of each country or could have used different colors for each country
The chart could be made more informative by including the average quantity of semi/wholly milled rice for Sudan and other countries in the same line
It is looking very clustered, could have used a GeoSpatial graph for visualization, which would have covered all 140 countries

Table 4

Table 4 shows the retail price of rice (₹ per kg) for select cities. Back in India, the decision may bring relief to consumers as many of them, especially in the southern States, were paying over ₹50 for a kilo of rice as shown in Table 4.

Improvements

Article does not mention the inflation in India
The table could be made more informative by using Heatmap or line chart instead of a table

Table 5

Table 5 shows the usual area covered under rice (FY18 to FY23 average) for the week ending July 13 and July 20 compared to the actual area covered this year (FY24)

Improvements

The table does not mention anything about the unit of the area covered under rice
I seriously didn't understand the table and the purpose of using this table here

Overall the author covers and conveys the intent from visualizations. However, the selection of charts could have been improved to further enhance the quality of the article and convey more information.

Thank you!!!

Prahlad19 commented 1 year ago

Karnataka election results 2023 | Party-hoppers who joined the Congress had more success than those who were fielded by BJP

URL: https://www.thehindu.com/data/karnataka-turncoats-bjp-congress-elections/article66842730.ece

Reviewed By: Prahlad Singhania 21f1006059

Published By: THE HINDU DATA TEAM May 13, 2023 01:53 pm | Updated 06:56 pm IST

Motive of Author’s Story: The author want to assess the success of different types of candidates for both of the major party using Strike rate as an evaluation metrics and thus identify the party which have gain a lot using some strategy like Turncoats or newbie candidates.

Dataset Info: This data visualization consists of several different types of candidates from the two major national party, Bharatiya Janata Party (BJP) & Indian National Congress (INC) that contestant in recent Karnataka legislative Assembly Elections. There are various types of candidates such as Fresher (Those candidates who has recently joined politics), Repeaters (Those candidates who have already contestant previous election from the same party), Current MLA (Also refer as sitting MLA, those who won seats in previous election) and Turncoats (Those candidate who participated in the previous election as a candidate of other recognized party). It also contains the percentage of strike rate of different types of candidate that took part in that election. The strike rate is the number of seats won by turncoats as a share of seats contested by turncoats.

Key findings by Author: The fresher strike rate is similar for both parties. Among repeaters however, Congress had a much better strike rate. When it came to sitting MLAs, the Congress again had a better strike rate. In this election 18 of the BJP candidates were turncoats, while the Congress fielded with 23 such candidates. The Congress achieved a much higher strike rate with their turncoats with 69.6% of them leading in seats, while the BJP’s turncoats had a lower strike rate at 5.6%.

Supporting Visualizations:

Author’s Visualization-1: Chart 1 shows the number of BJP candidates who are turncoats, fresher, repeaters and current MLAs in 2023 karnataka elections and their strike rates in the polls.

Improvements: The Data table does not signify much about effect of turncoat candidates with respect to other candidates for BJP. Therefore using different visualization, the story might depict the inference the author want to convey through this findings.

Reviewed Visualization:

Author’s Visualization-2: Chart 2 shows the number of Congress candidates who are turncoats, fresher, repeaters and current MLA in 2023 karnataka elections and their strike rates in the polls.

Improvements: The similar strategy should be used here for the Congress Candidates in order to derive certain insights about the performance of Turncoat candidates

Reviewed Visualization

Author’s Visualization-3: A comparison of the strike rates of various candidate-types between the BJP and the Congress is shown in Chart 3.

Improvements: The tabular visualization is not an effective way to compare performance of two or more entity. A 100% Stacked Column chart is one of the effective visualization when we want to compare the party performance in terms of certain parameters. Reviewed Visualization

blackpearl006 commented 1 year ago

Ninad Aithal 21f1006030

E-rickshaws to two-wheelers: The shift in the share of electric vehicles

Link to Article

Summary:

The author aims to visually demonstrate a significant transformation in the types of electric vehicles as their popularity rises in the country. While e-rickshaws previously held the majority market share, they have now been overtaken by two-wheeler in the last four fiscal years. Additionally, the penetration of electric vehicles varies widely across states.

Dataset Description of the Source :

The authors utilize data from the Vahan Sewa Dashboard, which provides a comprehensive dataset regarding vehicle registration and types, along with related fields, used in the above-mentioned article. The dataset spans from 2019 to July 2023 and is categorized by state, offering a detailed view of various aspects such as:

Vehicle registration
Number of transactions
Revenue collection
Permits
Tax defaulters as of the latest date

The dataset is structured as a time series, where each month is considered a distinct time frame. It not only contains yearly growth rates but also offers visual representations, such as bar graphs, showcasing the top 5 states and their respective subdivisions. These visuals provide a clear and concise way to analyze the trends and changes in the dataset over time.

Dimensions of the dataset in the article :

Number of Electric Vehicles: This dimension provides insights into the total market of electric vehicles in India and shows the cumulative count of electric vehicles. It helps demonstrate the overall upward trend of electric vehicles in the country.
Percentage of Electric Vehicles: This dimension compares the number of electric vehicles to the total number of vehicles on a state-wise basis. It serves to identify any disparities in the adoption of electric vehicles across different states and also highlights the penetration of the electric vehicle market in each state.
Electric Vehicles (Non-e-rickshaws): This dimension specifically focuses on electric vehicles that are not e-rickshaws. It differentiates between two-wheelers and other electric vehicle types, emphasizing the shift in trend from e-rickshaws to two-wheelers.

The data provided in the article is up to date as of July 14, 2023, making it a recent and relevant source for analyzing the current state of the electric vehicle market in India.

Gaps

The article does have some gaps and limitations in the data presentation, which are as follows:

Lack of Comparison with Non-electric Vehicles: The article does not visually compare the number of electric vehicles with non-electric vehicles (such as petrol or diesel vehicles). This omission hinders understanding the relatively small impact or market share of electric vehicles, which might be just around 2% of the overall vehicle market.
Ambiguity in Chart 2: Chart 2, depicting electric vehicles as a share of all other vehicles registered in each state, lacks clarity in conveying its meaning. Additionally, the chart does not include data for all states, and the color scheme used lacks significance, making it difficult to interpret the information effectively.
Complexity of Chart 4: Chart 4 contains a significant amount of information, making it challenging to grasp its insights at a glance. Furthermore, the bubble sizes in the chart are not adequately labeled, leaving the reader uncertain about what each bubble represents.

Essential Data

Number of Electric Vehicles: This is vital data as it directly reflects the growth and popularity of electric vehicles in the country.
Percentage of Electric Vehicles: This data is essential for understanding the market penetration of electric vehicles and identifying any disparities across states.
Electric Vehicles (Non-e-rickshaws): This data provides valuable insights into the trend shift from e-rickshaws to other electric vehicle types, such as two-wheeler.
Time Frame: The data being up-to-date as of July 14, 2023, is relevant as it ensures that the analysis reflects recent trends and developments.

Charts in the Article

The charts in the article are very nicely drawn and scrapping data from vahan sewa website isvery difficult there are no API's and we can only get 5 year of data at a time. There are few places of improvement which i have mentioned near the charts Chart1

Original Data :

Advantages and Disadvantages: The cumulative Flow graph effectively represents the Indian market trend and serves as an excellent visual presentation. However, it lacks proper labeling for the X and Y axes, and the units for the Y axis are not specified. Adding permanent labels for each year would enhance the graph's informational value. Furthermore, providing details about the methodology used to predict the trend for 2024 would greatly improve its credibility.

Chart 2

Advantages and Disadvantages: The current visual representation is flawed, as it may mislead people with its use of circles and fail to convey that electric vehicles constitute only around 2% of the total vehicles in the state. A more suitable alternative would be an India map with color correlation, accurately reflecting the percentage of electric vehicle share. This would provide a clearer and more informative view of the data.

Chart 3

Original Data :

Advantages and Disadvantages:

The Stacked area chart in the article is highly effective in illustrating the diminishing dominance of e-rickshaws and the increasing trend of two-wheeler, along with other important electric vehicle categories. The chart benefits from clear color coding and legends, making it easy to interpret. However, there are areas for improvement. Firstly, labeling the axes would enhance the overall clarity of the visualization. Secondly, the chart does not specify whether the Y-axis parameter represents the cumulative registered electric vehicles or the number of electric vehicles registered in each year. Additionally, it does not clarify whether the years are presented in the financial year format or the calendar year format. Addressing these points would significantly enhance the chart's informational value.

Chart 4

Advantages and Disadvantages: The data presented in the chart is overly complex and confusing, leading to difficulty in comprehending its message. While more information is offered below the chart, it appears redundant after analyzing the preceding charts. It does provide comparisons between non-e-rickshaws, but it does not directly contribute to showcasing the shift in trend from e-rickshaws to two-wheelers.

priyanka-maz commented 1 year ago

Balasore tragedy | Decline in train accidents despite lower budgeting for train safety

Priyanka Mazumdar 21F1000367

Link to Hindu Article - https://www.thehindu.com/data/data-balasore-tragedy-data-reveals-decline-in-train-accidents-but-indian-railways-safety-expenses-remain-low/article66937787.ece

Story that the Article Represents

The Balasore Accident that happened recently had a huge number of casualties. This led the author to investigate into the Indian Railway's year books and annual reports as well as Parliamentary Standing Committee reports to find a relation between the number of accidents per year and the union budget allotted to the safety and repair of railway trains.

Gaps in their visualization

Although the overall trend in the number of accidents is in a decline, it is not apparent whether the total no. of train accidents had declined from visualizing each separate category of accident with their own fluctuations in the line chart, Chart 1.
The total spending of the budgetary union on Railway Safety wasn't shown as a cumulative sum nor shown in the same axis as that of accidents so it was hard to compare the two making the original author's point difficult to follow (That there is decline in train accidents despite lower budgeting for train safety).

My retelling of their story

Chart 1 - Year-wise cumulative number of accidents

The Hindu's Visualization

Screenshot 2023-07-31 at 11-22-35 Data Odisha train accident and the declining focus on safety by the Indian Railways

My Visualization

Link to Visualization - https://public.flourish.studio/visualisation/14594411/ Year-wise cumulative number of accidents

Chart 2 - Total Spending on Railways as % of budgetary support for capex and Yearwise No. of Accidents

The Hindu's Visualization

Screenshot 2023-07-31 at 11-38-51 Data Odisha train accident and the declining focus on safety by the Indian Railways

My Visualization

Link to Visualization - https://public.flourish.studio/visualisation/14594573/

Total Spending on Railways as % of budgetary support for capex and Yearwise No of Accidents

Conclusion

Although the original author's point is upheld in chart 2 (comparison line chart) drawn by me, it is still unclear whether the direct cause for the sudden uptick in massive accidents in the railways is due to the Union keeping on lowering railway safety budget due to the slow decline in train accidents over the last decade. More data could be analyzed and visualized to further prove the author's point, such as by plotting railway accidents as a choropleth of location-wise accidents, perhaps revealing that it is not the decline in spending, but rather misappropriation of funds due to neglect or corruption in certain areas, while other areas have benefited from rise of better safety technology, thus the lower spending.

Shreyays commented 1 year ago

Shreya Y
21f1002768

Link to article: https://www.thehindu.com/data/data-close-to-half-of-cases-in-hcs-pending-for-over-five-years/article66716280.ece

What is the story the author is trying to tell?

In the article, the authors are trying to draw attention to the proportion of cases pending for more than 5 years in High Courts of India. They also analyse and comment on these trends in states where the share of cases pending is large. The authors infer that the cause for 50% of the cases pending for more than 5 years is primarily the increase in the number of cases, without a commensurate increase in the number of judges presiding over those cases.
The authors have used statistics such as case clearance % by state, the average clearance rate vs vacancy % as of December 2022 and a comparison of the average cases pending per judge in 2017 vs 2022, to bring out the lags in case clearances. The authors conclude by suggesting that alternate dispute resolution mechanisms should also be pursued to reduce the burden of courts.

Data used to tell the story

The authors have used the data from India Justice Report, which further derives the data from the National Judicial Data Grid. The data points used are the case clearance rate by state from 2018 to 2022, the % of vacancies by state and the derived datapoint of average case pending per judge. The data is presented in a tabular form, scatter plot and column charts.

Dimensions of the data

Case clearance % by state: The case clearance % helps in comparison of case clearance rates by state high courts and for each state and overall, across years.

Case clearance and judge vacancy: The chart on judge vacancy vs case clearance, indicates the positioning of a state in the two-dimensional field of case clearance rate and judge vacancy, dividing them into broadly four quadrants based on their relationship.

Number of cases pending per high court judge: In this chart, one dimension is numerical (average cases pending per High Court judge) and the other is categorical data (State). The data encompasses 2 different representations for two years, enabling comparisons on a single axis.

Gaps

The authors intend to convey that close to half of the cases in high courts are pending for over five years. However, there is no data/there are no charts indicating % of cases pending by time period.

While the data highlights the % clearance of cases by state, it is not appropriately encoded so as to draw attention to states which are doing relatively better/poorer.

The chart on the case clearances vs the vacancy % of judges demonstrates the positioning of states based on these parameters, however it fails to capture the volume of new cases received by the state, which according to the article could be a reason why Tripura and Manipur, though had a relatively high vacancy, had a lower clearance rate.

Suggested Improvements

The case clearance rate for each state from 2020 -2022 has been presented as choropleths, which allows for easier visualisation of statistics across states, along with a geographic reference for these statistics. The colours have been encoded from the least % clearance to the highest % as orange to green.

Map 2020 data Map 2021 Map 2022

The average case clearance rate for each state over the five years between 2018 to 2022 has been presented as a dynamic visualisation of line charts, wherein the movement over the five years can be interpreted readily, without having to individually read entries in the table. The statistics for each state can be filtered reducing the clutter and keeping only the data relevant to the user.

Link to chart: https://public.flourish.studio/visualisation/14601513/

The chart containing the vacancy rates vs case clearance has been modified to include the number of cases as the size of the circles. One can now evaluate whether the states where there are high case clearances despite high vacancies are due to low volumes of cases registered.

Vacancy vs clearance

VarnikaRB commented 1 year ago

Name: Varnika Bagaria

Roll no.: 21f1007039

Article used: Link

Story that article tells us:

The data presented in the article indicates that the core sectors of India's Index of Industrial Production (IIP) experienced overall growth in June, with seven out of eight sectors showing improvement compared to the previous month. The IIP growth had reached a three-month high of 5.2% in May, and economists expect a similar growth rate of 4%-6% in June. The key points from the data are as follows:

Seven out of eight core sectors showed an uptick in June compared to just six in May.
Crude oil was the only sector that contracted for the 13th consecutive month, but the rate of decline eased to 0.6%.
Coal production and cement production saw substantial growth rates of 9.8% and 9.4%, respectively.
Some sectors experienced a decline from May levels, including fertilisers, refinery products, coal, and crude oil.

Economists attribute the overall growth to the government's focus on infrastructure development, especially in roads, which is reflected in strong numbers for steel and cement. The delayed onset of the monsoon also contributed to improved performance in sectors like electricity and coal.

The data suggests that the economy is experiencing broad-based growth, driven by infrastructure spending and favorable conditions in certain sectors. Despite some moderation in year-on-year performance in various high-frequency indicators, the growth in the IIP is expected to remain positive for June.

Gaps in visulaization

For different type of industry we will not be able to compare the growth of industry at a time, so it will take some time to analyze, so we can combine them in one chart.
With just percentages, it is difficult to know the total amount of growth.

Chart

According to Hindu datapoint

Coal

Crude oil

Fertilizers

Other Refinery products

According to my visualization

bsc-iitm / Data-Visualization-Design-CS4001

Graded Assignment -4 (May Term 2023):- Redesigning The Hindu Data Point Stories #16

Data | A third of Central University teaching positions lying vacant

Anant Kumar 21f1000683

Date Published:

Authors:

Intent of the Original Story:

Dataset Description:

Gaps in the Dataset:

My Dataset:

Authors Chart 1:

Improvements in the visualization

My Visualization

Authors Chart 2:

Improvements in the visualization

My Visualizations

Authors Chart 3:

Improvements in the visualization

My Visualizations

Authors Chart 4:

Improvements in the visualization

International impact of India's rice export ban

Intent of Original Article

Analysis of Original Visualizations

Critique of Original Visualizations

My own Visual Re-telling of the Story

Sarthak Gautam 21f1000864

Data | Who does India’s rice export ban impact the most?

Original intent of the story:

Data set used:

Details of the Data:

Type of Data:

Extent of the Data:

Dimensions of the Data:

Gaps in the Data:

Essential vs. Irrelevant Data:

Dataset search

Working:

Duration and Flexibility of the Export Ban:

Illustration 1 (Percentage share of different rice type)

Author's working

Improvements in the visualisation

Illustration 2

Illustration 3

2018 to 2023

Reason for increasing price:

Conclusion

Tools Used:

Uday Patil

Chandrayaan-3 mission: How tough is it to land on the moon?

Original intent of the story:

Details of Data

Data Encoding

Suggestions

Redesigned charts

The Evolution of the Scripps Spelling Bee

What is the story the author is trying to tell?

Data used by the author

The type of data

The extent of the data

The dimensions of the data

Gaps in the data

Essential Data:

Irrelevant Data:

Author Charts

Data Encoding

Problems with visual encodings

A comparison of India’s growth with other nations

Intent of Original Article

Analysis of Original Visualizations

Suggestions

Redesigned charts

India’s staggering wealth gap.

SECTION 1 : Story the Author is trying to tell / Intent of original story

Analysis and Critique of each visualization :

SECTION 2 : Dataset used and its description in detail

The gender gap in clinical trials and disease funding

Original Intent of the article:

Understanding the charts

Data

Sarthak Gautam
21f1000864

Shreya Y
21f1002768