Closed gbinal closed 1 year ago
On June 1, a new feature that also scans HTTP response's headers for evidence of CMS usage was deployed. Below are the counts for CMS detection after this feature was introduced: a lot of new Drupal hits, but no additional CMS detection.
The scan engine has been updated to detect Microsoft Sharepoint via CMS headers--updated totals below.
Great progress!!!!!
For a next step, let's take this list, which Ben had previously been finding, and look at how his code looks for them, and if the method is HTML or header sniffing, then to gutcheck whether there's any reason why we might not be seeing them.
Here's some info regarding the CMSs we're not picking up.
We currently don't have any HTML snippets or HTTP response headers to scan for the following CMSs:
We are scanning for the following CMSs using the means specified below. If we have particular examples of sites that are using any one of these three CMSs, then I can try to determine why they aren't being picked up.
Gotcha. So, for now, let's set aside the first 7 since we're not necessarily looking to further expand the detection methodologies.
For the latter 3, yeah, let's look at Ben's data to see if the ones he's found are ones that we should have, too, or maybe they have changed CMSes.
Note that if we expand the methodologies in the future, we should include wagtail CMS since that was requested.
For Joomla, Percussion, and SilverStripe, I think those are being detected in Ben's data by way of looking for a meta
element with a generator attribute containing a certain value. In the case of dni.gov:
<meta name="generator" content="Joomla! - Open Source Content Management">
We currently are not looking for these elements, which is why we don't have hits for these three CMSs.
Makes sense. In our conversation, I think you said that we could pretty easily expand our html sniffing to cover those too, so that's the next step.
this is done, in local development. We can close it when the deploy is done and confirmed....
These changes are live and the snapshot has been updated.
Great!
this is done - great work.
Following up on the great work of #311, we should see if we can improve the metholodogy so as to detect more examples of CMS in action.
To elaborate, we are looking for about 30 different snippets that would suggest different CMSes but are only seeing results for 4-5 CMS.
Based on @benbalter's good work (summary here), I would expect to see more CMSes represented. Especially sharepoint, joomla, percussion, liferay maybe? (source data here)