directus-labs / guest-authoring

A repo for our guest authors to work on content
11 stars 40 forks source link

Building a Web Scraping Extension with Playwright #209

Closed Okeke12nancy closed 2 months ago

Okeke12nancy commented 4 months ago

What is your idea?

Introduction Overview: Brief introduction to the project and its purpose. Goals: Outline what the reader will learn and achieve by the end of the article. Technologies Used: List and briefly describe the technologies involved (Playwright, Directus, Node.js, Angular, AI). Section 1: Project Setup Setting Up the Development Environment Required tools and installations (Node.js, npm, Angular CLI, etc.) Setting up a new Angular project Initializing a Node.js project Section 2: Integrating Directus Installing and Setting Up Directus Installation process for Directus Configuring the Directus backend with a SQL database Creating collections for storing scraped data, summaries, and keywords Directus API Setup Generating API endpoints with Directus Managing roles and permissions Section 3: Web Scraping with Playwright Introduction to Playwright What is Playwright and why use it for web scraping Basic Playwright setup and configuration Building the Scraper Writing scripts to scrape company content Handling dynamic content and pagination Extracting and saving company logos Section 4: Summarizing Content with AI Introduction to OpenAI GPT-4 Overview of GPT-4 capabilities and use cases Setting up access to the OpenAI API Integrating GPT-4 for Summarization Writing functions to send scraped content to the GPT-4 API Processing and storing the AI-generated summaries and keywords in Directus Section 5: Backend Development with Node.js Creating API Endpoints Setting up Express.js for the backend server Writing endpoints to interact with Directus and serve data to the frontend Implementing business logic for evaluating companies Data Processing and Evaluation Algorithms Designing algorithms to evaluate company content Integrating evaluation logic with Directus data Section 6: Frontend Development with Angular Building the Angular Application Creating the basic structure of the Angular app Setting up services to communicate with the backend APIs Implementing components to display scraped data, summaries, and evaluations User Interface and Experience Using Tailwind CSS or Bootstrap for styling Adding interactivity and user-friendly features Section 7: Authentication and User Management Setting Up Authentication Using Directus for managing users and roles Integrating Auth0 (optional) for advanced authentication features Section 8: Deployment and DevOps Containerizing the Application Using Docker to containerize the frontend, backend, and Directus services Writing Dockerfiles and docker-compose configurations Orchestrating with Kubernetes Setting up Kubernetes for managing containerized applications Deploying the application to a cloud service (AWS, GCP, Azure) Section 9: Monitoring and Maintenance Performance Monitoring Integrating New Relic or Datadog for application performance monitoring Setting up alerts and dashboards Regular Maintenance and Updates Strategies for maintaining and updating the application Backup and recovery plans for Directus data Conclusion Summary of Achievements Recap the main points covered in the article Highlight the key functionalities of the tool built

What are the key takeaways from your post?

Readers will learn how to build an advanced web scraping and summarization tool using Playwright for scraping, Directus for content management, and OpenAI GPT-4 for AI-driven summarization. The project also covers integrating a Node.js backend with an Angular frontend, and best practices for deployment and monitoring.

Country of residence

Nigeria

Terms & Conditions

github-actions[bot] commented 4 months ago

Thank you for submitting an idea for our guest blog.
We work through new ideas every few weeks as we put together our content schedule. This means you may not get an immediate response as to whether your idea has been accepted, or any follow-up questions we have to clarify your idea.
If your idea is accepted, we will provide a deadline for first draft and how much we can pay you for the post. You will have a few days to confirm whether you are still able and willing to write the post.
If you have any questions in the meantime, feel free to add a comment to this issue.

Okeke12nancy commented 4 months ago

@phazonoverload can I create more than one issue?

phazonoverload commented 4 months ago

@Okeke12nancy yes you can!

phazonoverload commented 4 months ago

This is way too much for one post, but the web scraping part is interesting. How do you feel about building a Directus extension which allows the user to provide a URL and scrape data from it? I've not fully formed the thought but that would be interesting.

Okeke12nancy commented 4 months ago

@phazonoverload , yeah that seems interesting. Can we discuss more about this on Discord or Slack? I need more information on the preferred tech stack to use, etc.

phazonoverload commented 4 months ago

I'd prefer to keep conversation here so it isn't locked away in DMs. I think this would be interesting as a Directus extension - so no separate frontend or application.

Okeke12nancy commented 4 months ago

@phazonoverload okay, that can work.

Should i go ahead and start writing the article? It's an advanced topic, so i need to start on time

phazonoverload commented 4 months ago

We are still just discussing the idea and making sure that it works. When we review all of the submissions next week, I will let you know whether this was accepted with the revised topic, timelines, and budget :)

Okeke12nancy commented 4 months ago

Okay, noted

BB-Loft commented 3 months ago

Based on the agreed scope we can pay $500 for this post.

This post will be included as part of our August content schedule. The first draft is due on July 1 so we have time to properly review it and you have time to respond.

If interested and you’re happy to commit to the deadline, please let me know in the next few days and I will mark this issue as Approved.

The process is detailed in the README of this repo 😄

Okeke12nancy commented 3 months ago

Sure, I am interested and happy to commit to the deadline.

BB-Loft commented 2 months ago

Hi @Okeke12nancy, please can I get an update on when the first draft is going to be shared?

Okeke12nancy commented 2 months ago

Hello @BB-Loft , i have made a pull request already. Please check.

BB-Loft commented 2 months ago

Got it, thanks @Okeke12nancy 👍🏻

Okeke12nancy commented 2 months ago

@BB-Loft Hi, I haven't gotten a feedback on my draft. I want to know if I need to make further Improvements.

phazonoverload commented 2 months ago

I've literally got it open now, so expect to review this evening or tomorrow morning.

Okeke12nancy commented 2 months ago

hello @phazonoverload i have pushed the latest changes