CompuMasterGmbH / cammIntegrationPortal

camm Integration Portal (based on camm Web-Manager)
MIT License
2 stars 1 forks source link

New search concept #14

Open JochenHWezel opened 8 years ago

JochenHWezel commented 8 years ago

Search using CMM crawler but with index for each Server Group and with security Information descriptor on each record

Cwm.Page implements Standard Interface

CrawlerRecord class, as List

IsCrawlerRequest checks for user Agent with individual token per crawler-user

cwm Needs an additional user for doing the crawling Jobs

crawler crawls

JochenHWezel commented 8 years ago

requires plugin concept

JochenHWezel commented 8 years ago

Pipeline

  1. craler,sitemap scan list
  2. crawler record data
  3. split up into words
  4. usage by search form with Logical parser
JochenHWezel commented 8 years ago

apps might provide crawler Setup for standard search index of Server Group, they might additionally or alternatively specify additional search index names for purpose of in-app-search-index

JochenHWezel commented 8 years ago

pages being crawled by crawler provide the data from the crawler Setup as follows:

JochenHWezel commented 8 years ago

crawler tracks page Status - on repetitive page error of an URL, it stops crawling for a configured time

JochenHWezel commented 8 years ago

search page can be set up with a Parser pre-set - requiring that user search text is logically valid/closed nexted logic Levels automatically on demand

JochenHWezel commented 8 years ago

crawler should be able to crawl

JochenHWezel commented 8 years ago

CrawlerRequestSetup must configure

JochenHWezel commented 8 years ago

crawler,should consider meta robots/index no-follow/follow, /robots.txt commands, /robots.txt-sitemap

JochenHWezel commented 8 years ago

might make sense to make indexing use a 2nd database for big Environments/heavy load and use the cwm db only for small Environments/light load

might make sense to re-use other technics/modules like luscene or other engines