danithaca / balalab-public

Automatically exported from code.google.com/p/balalab-public
0 stars 0 forks source link

Improve URL checking when creating new items: duplicity/views180 #74

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Right now we check URL duplicity and validity at the backend when creating new 
items. Some improvements can be made:

1. Use Ajax to show whether the URL is duplicated/valid before user hitting the 
"submit" button. Similar to password check or username check.
2. URL validity check can happen on the clientside instead of on the backend.
3. Periodically we can use the linkchecker.module to check link validity 
offline.

Original issue reported on code.google.com by danith...@gmail.com on 21 Oct 2013 at 7:27

GoogleCodeExporter commented 9 years ago

Original comment by danith...@gmail.com on 21 Oct 2013 at 7:33

GoogleCodeExporter commented 9 years ago
see also:
https://drupal.org/node/760660
https://drupal.org/project/formmsgs
http://data.agaric.com/check-duplicate-titles-node-ajax-warn-immediately-before-
entering-more-data-or-submitting
https://drupal.org/project/unique_field

Original comment by danith...@gmail.com on 21 Oct 2013 at 7:35

GoogleCodeExporter commented 9 years ago
Also, note that some URLs might be the same when just adding a few query params 
for performance/campaign purposes. For example: 
http://www.nytimes.com/2013/10/22/opinion/a-new-day-in-new-jersey.html?_r=0&hp=&
adxnnl=1&adxnnlx=1382461350-r/Nm8uwxIcC2spv57ZKI8g is essentially just 
http://www.nytimes.com/2013/10/22/opinion/a-new-day-in-new-jersey.html.

The difficulty here is that we can't just remove the query strings. Because 
some query strings are necessary to get you the correct article.

The solution here might be to do offline checking for duplicate items.

Also, this might pose a problem for <iframe> blacklist, where we need to match 
the URL. But for now we mainly use domain blacklist, so it should be fine.

Original comment by danith...@gmail.com on 22 Oct 2013 at 5:05

GoogleCodeExporter commented 9 years ago

Original comment by danith...@gmail.com on 22 Oct 2013 at 5:05

GoogleCodeExporter commented 9 years ago
drupal_http_request() doesn't handle redirection very well. For example:
http://fivethirtyeight.blogs.nytimes.com/2011/06/20/poll-finds-a-shift-toward-mo
re-libertarian-views

The redirection code is 303, which is not automatically handled in 
drupal_http_request(). Current we use a hack of allowing any http code starts 
with '3' as valid. But this is a hack.

Perhaps need to use CURL instead.

Original comment by danith...@gmail.com on 29 Oct 2013 at 4:12

GoogleCodeExporter commented 9 years ago

Original comment by danith...@gmail.com on 15 Nov 2013 at 8:57

GoogleCodeExporter commented 9 years ago
The code has gone through significant changes. Forget about all previous 
comments. 

Here's the current behavior:
1. When a user comes from "Share Articles/Videos", the item/add/nojs?url=... 
will automatically fires Diffbot request on the client side and load the item 
in the form.
2. When a user directly goes to item/add/nojs, Diffbot call is not triggered.
3. It doesn't do any URL checking (assuming Diffbot handles invalid URLs)
4. At the backend, we can check duplicity/views180 items periodically. But it's 
only checking, fixing is up to manual admin.

What needs to be fixed here:
1. Code should check article duplicity and origin (not views180 code), perhaps 
at both the frontend and the backend.
2. Even if a user go directly to item/add/nojs, Diffbot can still be triggered 
through .change() event.

Right now the checking duplicate items and views180 items at the backend seems 
to be the most flexible approach.

Original comment by danith...@gmail.com on 17 Mar 2014 at 4:37

GoogleCodeExporter commented 9 years ago

Original comment by danith...@gmail.com on 16 Jul 2014 at 3:34