Closed vaibhavgeek closed 4 weeks ago
Can you share detailed tech stack plan for classification, creating the test dataset is not super important task but just to validate the classification task that the code does. How you think about classification shows your understanding of this task. Do share a detailed plan about the same.
Hi @vaibhavgeek After a long research and few POCs we have implemented 3 approaches for this Activity Classification Algorithm. They are as followed.
all-mpnet-base-v2
zero-shot-classification
pipeline with the model of choice.multi_label
flag false for better accuracy.is this related to {activity1}
, passage], [is this related to {activity2}
, passage], ...]BAAI/bge-reranker-v2-m3
)BAAI/bge-reranker-v2-m3
)After analysing above implementations we came to conclusion that
Keyword Extraction Approach
.Can you share the test data you used to determine the accuracy?
Sure, Here is the test Data which I have benchmarked all of these approaches.
[
{
"TestCase": 1,
"Content": "Global Trading Platform - Trade Stocks, Forex, and Commodities. Home, Markets, Trading Tools, Education, Support. Live Market Updates: Real-time data on stock prices and currency exchange rates. Educational Webinars: Upcoming sessions on trading strategies. Economic Calendar: Key financial events and announcements. Risk disclosures and regulatory compliance information.",
"Activity": "Trading",
"Reasoning": "This content is centered on trading activities, providing services and information directly related to financial trading. The classifier should assign this content to 'Trading' based on the focus on stock markets, trading tools, and investment education."
},
{
"TestCase": 2,
"Content": "Innovate Design Studio - Crafting Visual Experiences. Portfolio, Services, Blog, About Us, Contact. Showcasing high-quality images of recent design projects. Our services include branding, web design, illustration, and UI/UX design. Case studies provide detailed accounts of successful design projects. Blog posts cover design trends, tutorials, and industry insights. Connect with us on Behance, Dribbble, and Instagram.",
"Activity": "Designing",
"Reasoning": "The content is specific to designing, focusing on creative services and showcasing design work. The classifier should assign 'Designing' as the activity due to the emphasis on design services and industry terminology."
},
{
"TestCase": 3,
"Content": "CineWorld - Your Gateway to the Latest Movies. Now Showing, Coming Soon, Trailers, Reviews, Tickets. Featuring upcoming blockbuster movie posters with 'Book Now' options. Top Picks: Editor's choice of must-watch films. User Reviews: Audience ratings and comments. Exclusive Interviews: Videos with directors and actors. Sign up for our newsletter and connect with movie industry partners.",
"Activity": "Movies",
"Reasoning": "This content revolves entirely around movies, including showtimes, reviews, and interviews. The classifier should assign the activity 'Movies' based on the entertainment-focused content."
},
{
"TestCase": 4,
"Content": "Department of Environmental Protection - Official Government Site. Programs & Services, Laws & Regulations, Newsroom, About DEP, Contact Us. Updates on environmental policies and initiatives. Public Notices: Information on upcoming hearings and public comment periods. Resources include access to permits, forms, and environmental data. Government logos, privacy policy, and accessibility statement.",
"Activity": "Government",
"Reasoning": "The content is specific to government activities, focusing on environmental policies and official services. The classifier should assign 'Government' as the activity due to the official nature and governmental context."
},
{
"TestCase": 5,
"Content": "Eventopia - Your Event Planning Partner. Services, Portfolio, Testimonials, Blog, Get a Quote. Showcasing beautifully decorated venues with happy attendees. Our services include wedding planning, corporate events, and social gatherings. Client testimonials from satisfied customers. Blog articles offer tips on event themes, budgeting, and vendor selection. Contact information and social media links.",
"Activity": "Planning",
"Reasoning": "This content is dedicated to planning, specifically event planning services. The classifier should assign 'Planning' as the activity based on the focus on organizing events and providing planning resources."
},
{
"TestCase": 6,
"Content": "SportsHub - All Things Sports in One Place. News, Live Scores, Teams, Players, Shop. Live Score Ticker with real-time updates on ongoing matches. Latest News: Breaking updates in the sports world. Featured Teams: Profiles and statistics. Merchandise Store: Official jerseys, equipment, and memorabilia. Subscribe for newsletters and access to premium content.",
"Activity": "Sports",
"Reasoning": "The content is entirely about sports, offering news, live scores, and merchandise. The classifier should assign 'Sports' as the activity due to the comprehensive sports-related content."
},
{
"TestCase": 7,
"Content": "Stadium Architects - Designing the Future of Sports Venues. Projects, Services, Insights, About Us, Contact. Showcasing time-lapse videos of stadium constructions. Featured Projects: Details on recently completed stadiums and arenas. Services include architectural design, structural engineering, and sustainability consulting. Insights: Articles on trends in sports venue design. Awards and recognitions in architectural excellence.",
"Activity": "Designing",
"Reasoning": "Although the content intersects designing and sports, the primary activity is designing, specifically architectural design. The classifier should assign 'Designing' as the activity due to the focus on design services and architectural projects."
},
{
"TestCase": 8,
"Content": "Collector's Paradise - Trading Cards Galore. Home, Categories, Auctions, Blog, Community. Featured Categories include sports cards (baseball, basketball, football), movie memorabilia (posters, scripts, limited edition items), and gaming cards (Pokémon, Magic: The Gathering). Upcoming Auctions: Schedule and featured items. Collector's Blog: Tips on valuing and trading collectibles. Membership signup and customer support links.",
"Activity": "Trading",
"Reasoning": "Despite involving sports and movies memorabilia, the primary activity is trading collectibles. The classifier should assign 'Trading' as the activity based on the emphasis on buying, selling, and auctioning items."
},
{
"TestCase": 9,
"Content": "City Planning Department - Shaping the Future of Our City. Services include urban planning, zoning, and development approvals. Public meetings and hearings schedule. Resources on city development projects. Contact information for community feedback.",
"Activity": "Government",
"Reasoning": "This content represents a government department focused on urban planning. The classifier should assign 'Government' as the activity due to the official context and government-related planning services."
},
{
"TestCase": 10,
"Content": "Financial Planning Advisors - Secure Your Future. Services, Our Approach, Resources, Blog, Contact Us. Images of families consulting with financial advisors. Our services include retirement planning, investment strategies, and wealth management. Resources: Market reports and investment calculators. Blog posts: Articles on financial trends and tips. Certifications and affiliations with financial regulatory bodies.",
"Activity": "Trading",
"Reasoning": "While 'planning' is mentioned, the context is financial planning related to investments and trading. The classifier should assign 'Trading' as the activity due to the financial and investment focus."
},
{
"TestCase": 11,
"Content": "National Film Board - Preserving Cinematic Heritage. Home, Collections, Education, News, About Us. Free streaming of classic films. Collections: Archives of historically significant movies. Educational Programs: Workshops and resources for schools. News: Government initiatives supporting the film industry. Government insignia and links to cultural departments.",
"Activity": "Movies",
"Reasoning": "Even though it's a government site, the primary activity is movies. The classifier should assign 'Movies' as the activity, focusing on cinematic content and film preservation."
},
{
"TestCase": 12,
"Content": "Coach's Corner - Training Plans for Athletes. Training Programs, Nutrition, Blog, About Us, Contact. Images of athletes in training sessions. Customized training plans for runners, swimmers, and cyclists. Nutrition guides: Meal plans to enhance performance. Success stories: Testimonials from athletes. Subscribe to our coaching newsletter.",
"Activity": "Sports",
"Reasoning": "The content is centered on sports training and athletic performance. The classifier should assign 'Sports' as the activity due to the focus on sports-related coaching and training programs."
},
{
"TestCase": 13,
"Content": "Creative Minds - Designing Innovative Products. Our team specializes in product design, from concept to prototype. Services include industrial design, 3D modeling, and usability testing. Portfolio showcases award-winning designs. Blog articles discuss the latest in design technology and materials.",
"Activity": "Designing",
"Reasoning": "This content is focused on product and industrial design. The classifier should assign 'Designing' as the activity due to the emphasis on creating and developing new products."
},
{
"TestCase": 14,
"Content": "Strategic Business Planning Services - Charting the Path to Success. Offering corporate planning, market analysis, and strategic development services. Case studies of successful business transformations. Resources include planning templates and industry reports.",
"Activity": "Planning",
"Reasoning": "The content revolves around business planning services. The classifier should assign 'Planning' as the activity due to the focus on strategic planning and organizational development."
},
{
"TestCase": 15,
"Content": "Sports Analytics Inc. - Data-Driven Insights for Athletic Performance. Services include performance analysis, game strategy optimization, and player scouting reports. Case studies showcase how analytics improved team outcomes. Blog posts on the latest trends in sports data science.",
"Activity": "Sports",
"Reasoning": "This content is dedicated to sports analytics, focusing on enhancing athletic performance through data. The classifier should assign 'Sports' as the activity due to the sports-specific application of analytics."
}
]
Sounds good, create a pull request. The code needs to be called from https://github.com/Kleo-Network/connect-backend/blob/main/app/celery/tasks.py, so please ensure that.
Test the function independently in a file before adding to tasks.py. I will merge the PR and close the PR. Awesome work.
@vaibhavgeek
The PR for this issue #30 Has been merged on 30, Sep, 2024
.
Need to close this issue. Please close this issue at your convenience. Thanks.
Awesome work! Closing this now.
Step 0: Submit a plan on how you will implement this. Step 1: List of activities is mentioned below (there can be more than this. ) Each activity can be further associated with few labels with which a matching algorithm is run. Create a test data for the text that you wish to classify as following activities.
Step 3: Write a python module to classify any given text with these activities. You can create new labels for each activity. Step 4: Submit the PR for this, write function on this file https://github.com/Kleo-Network/connect-backend/blob/main/app/celery/tasks.py
VIDEO Exaplaining the Bounty - https://www.loom.com/share/fcac9bc2742d437692fc60d5ca0e35a4?sid=c2495a45-87a1-459f-abfd-520ed430b4b4
Total Bounty - 200 USDC Reviewer - @vaibhavgeek