Closed regismvargas closed 3 months ago
This website might have anti-scraping protection, which is usually triggered by headless browsers. Try setting "headless":False
in graph_config
and let us know if you get a different result.
Thanks for your answer. It worked, but not as expected. See details below.
First: the original problem
"headless":False
leads to another error:
╔════════════════════════════════════════════════════════════════════════════════════════════════╗
║ Looks like you launched a headed browser without having a XServer running. ║
║ Set either 'headless: true' or use 'xvfb-run <your-playwright-app>' before running Playwright. ║
║ ║
║ <3 Playwright Team ║
╚════════════════════════════════════════════════════════════════════════════════════════════════╝
I have managed it in this way:
!apt install xvfb
!pip install pyvirtualdisplay
import pyvirtualdisplay
display = pyvirtualdisplay.Display().start()
source and credit: https://colab.research.google.com/drive/1or8DtXZP8ZxJYK52me0dA6O9A1dXKKOE?usp=sharing
Worked.
Second: I got several "\n" and other into 'result'
I got the following:
{'title': 'Autonomous vehicles moving forward: Perspectives from industry leaders', 'description': 'McKinsey’s 2023 global executive survey on autonomous driving reveals that despite recent uncertainties, the autonomous-vehicle industry is beginning to take shape.', 'content': '**2023 was a tipping point** for the autonomous-vehicle industry. Although leading players were able to successfully run and scale first commercial operations and increase their funding, others saw significant setbacks, stopped or reduced their operations, or exited the market entirely. This in mind, there is still much to be done before the autonomous-vehicle industry is fully mature—but how much?\n\n## About the authors\nThis article is a collaborative effort by Derek Chiao, [Johannes Deichmann](http://www.mckinsey.com/our-people/johannes-deichmann), [Kersten Heineke](http://www.mckinsey.com/our-people/kersten-heineke), Ani Kelkar, Martin Kellner, Elizabeth Scarinci, and Dmitry Tolstinev, representing views from McKinsey’s Automotive and Assembly Practice and the McKinsey Center for Future Mobility.\n\nThis past summer, the McKinsey Center for Future Mobility conducted a follow-up to its 2021 survey of industry decision makers (see sidebar “Survey methodology”).[1Kersten Heineke, Ruth Heuss, Ani Kelkar, and Martin Kellner, “[What’s next for autonomous vehicles?](http://www.mckinsey.com/features/mckinsey-center-for-future-mobility/our-insights/whats-next-for-autonomous-vehicles),” McKinsey, December 22, 2021.](javascript:void\\(0\\);)\n\nOur 2023 survey revealed that much has changed in this dynamic sector in the past two years: regional expectations are shifting, timelines for autonomous-vehicle development are extending, and needed investments are increasing. Other results reveal new opportunities for autonomous-vehicle manufacturers, such as more diversified markets and technologies with margins of 17 percent or more.\n\n## Survey methodology\nThe McKinsey Center for Future Mobility, in partnership with The Autonomous, conducts a biannual survey of leaders in the autonomous-driving industry, which took place from June to August 2023. The 2023 survey included 86 decision makers from around the globe (40 from North America, 37 from the European Union, three from China, and six from other regions). They represented some of the world’s largest software and automotive corporations, as well as prominent start-ups and supporting institutions such as universities and mapping and navigation companies. These decision makers ranged from chief experience officers and heads of strategy to systems architects and vice presidents of engineering, together presenting a holistic view on the state of the industry. In some instances, results have been combined with the 2021 baseline to provide data that is analytically rich. In this article, we offer updated insights from industry leaders in key categories: regional and market diversification, predicted timelines, expected bottlenecks, the size of needed investments, profitability of autonomous-vehicle components, and monetization models. These results shine a light on how the autonomous-vehicle industry could take shape in the years and decades to come.\n\n## Players expect regional and market diversification\n\n## About the McKinsey Center for Future Mobility\n**These insights were developed** by the McKinsey Center for Future Mobility (MCFM). Since 2011, MCFM has worked with stakeholders across the mobility ecosystem by providing independent and integrated evidence about possible future-mobility scenarios. With our unique, bottom-up modeling approach, our insights enable an end-to-end analytics journey through the future of mobility—from consumer needs to a modal mix across urban and rural areas, sales, value pools, and life cycle sustainability. [Contact us](mmip@mckinsey.com) if you are interested in getting full access to our market insights via the McKinsey Mobility Insights Portal.\n\nMost survey respondents predict that three or less companies will capture a dominant share of the market. The North American market is expected to be the most fragmented, with only 15 percent of respondents expecting that the market will be dominated by one or two players. By contrast, 38 percent of respondents predict that the European market will be dominated by two or fewer players. Predictions for the race to full autonomy are also shifting: while 58 percent of 2021 survey participants believed that North America would be the first to deploy Level 4 (L4) highway pilots, 2023 respondents were evenly split between believing China or North America would be first. This is evidence of China’s progress in the autonomous-vehicle race, driven by factors such as robust government backing; heightened investments in research and data availability; and a receptive consumer attitude toward adopting new technology.\n\nExhibit 1 ![Most survey respondents expect the autonomous-vehicle market to be dominated by more than two players.](http://www.mckinsey.com/~/media/mckinsey/features/mckinsey%20center%20for%20future%20mobility/our%20insights/autonomous%20vehicles%20moving%20forward%20perspectives%20from%20industry%20leaders/svgz-autonomousvehiclessurvey-ex1.svgz?cq=50&cpy;=Center)\n\nWe strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: [McKinsey_Website_Accessibility@mckinsey.com](McKinsey_Website_Accessibility@mckinsey.com)\n\n## The timeline for autonomous-vehicle development is extending\n\n## Stages of autonomous-vehicle development\nSAE International, a global professional association that develops engineering standards, splits autonomous-vehicle development into five levels, referred to as Level 0 (L0) through Level 5 (L5).[1 _SAE Blog_ , “SAE Levels of Driving Automation™ refined for clarity and international audience,” SAE International, May 3, 2021.](javascript:void\\(0\\);)\n\nL0 through Level 2 require humans to drive and constantly monitor automated support systems, which include warning systems, braking and acceleration, and steering. Level 3 (L3) vehicles are the highest level of automation widely available to consumers today. At this level, a car can operate independently, but systems can request that a driver take over at any time. These systems can operate only in certain conditions, such as during traffic jams. Level 4 (L4) vehicles, which include driverless taxis, are currently being tested, developed, and deployed. Unlike L3 vehicles, L4 vehicles function without a driver who is ready to take over. L5 vehicles are fully autonomous in any environment and under all conditions. These vehicles are the final frontier for autonomous-vehicle development.\n\nThe adoption timeline for autonomous vehicles has slipped by two to three years on average across all autonomy levels relative to the 2021 survey (see sidebar “Stages of autonomous-vehicle development”). According to this year’s survey, L4 robo-taxis are now expected to become commercially available at a large scale by 2030, and fully autonomous trucking is expected to reach viability between 2028 and 2031. This may be due to ongoing technical obstacles and challenges with capital availability. In addition, regulatory challenges persist as autonomous-vehicle regulations are still being developed and enacted.\n\nDespite these projections, well-funded pioneers are pushing ahead and moving to expand deployment across geographies.\n\nExhibit 2 ![Timelines for Level 4 and Level 5 autonomous-vehicle use cases have extended by two to three years on average.](http://www.mckinsey.com/~/media/mckinsey/features/mckinsey%20center%20for%20future%20mobility/our%20insights/autonomous%20vehicles%20moving%20forward%20perspectives%20from%20industry%20leaders/svgz_autonomousvehiclessurvey-exs_ex2-v6.svgz?cq=50&cpy;=Center)\n\nWe strive to provide individuals with disabilities equal access to our website. If you would like information about this content we will be happy to work with you. Please email us at: [McKinsey_Website_Accessibility@mckinsey.com](McKinsey_Website_Accessibility@mckinsey.com)\n\n## Regulation, technology, and consumer safety are key bottlenecks and considerations for development\n\nAbout 60 percent of respondents still believe regulation is the biggest bottleneck to autonomous-vehicle adoption, the same relative importance as in the 2021 survey. However, respondents this year reported an increased focus on technology, rising from an average of 26 percent in 2021 to an average of 32 percent in 2023. Though experts do not believe consumer demand will be the main impediment to adoption, autonomous-vehicle players still have important considerations to take into account to ensure consumer uptake. Two-thirds of respondents see improved safety as a key consideration for consumers. Productivity (the ability to multitask while driving) and comfort are anticipated to be secondary considerations in customers’ willingness to pay.\n\nExhibit 3 ![For leaders in the autonomous-'}}
As I saw in the documentation, the return should be clean, plain text. What am I doing wrong or missing here?
I'm glad the first solutionw worked, at least partially.
"headless":False
implies that Playwright will open a graphical instance of Chromium, which only works in a graphical user environment. That's why you had to set up a virtual display of some sort (I guess you're using Colab?). Maybe we'll make a new Colab example about this. Thanks for the input.
The second part of the problem is harder to tackle. I confirm that the output should be plain text. However, the answer is generated by an LLM, with all the problems that may stem from that, and our library uses the same system prompt for all LLMs. It took weeks of tinkering with the prompts just to reduce the amount of invalid JSON responses, and still, sometimes the output looks weird. Unless we come up with separate prompts for each model, or with a custom LLM fine-tuned for scraping, this kind of unexpected behavior will keep on showing up from time to time.
Describe the bug All URLs from domain is returning empty result
To Reproduce Domain: http://www.mckinsey.com
URLs tested and not working: https://www.mckinsey.com/features/mckinsey-center-for-future-mobility/our-insights/autonomous-vehicles-moving-forward-perspectives-from-industry-leaders https://www.mckinsey.com/industries/automotive-and-assembly/our-insights/autonomous-drivings-future-convenient-and-connected
Prompt: Summarize and find the main topics
My code:
Steps to reproduce the behavior:
I got this from McKinsey URLs
Expected behavior