appirio-tech / connect-app

Build your next project on Connect with the power of crowdsourcing
https://connect.topcoder.com
43 stars 139 forks source link

Character are HTML encoded in notification email subject #3485

Open vikasrohit opened 4 years ago

vikasrohit commented 4 years ago

In connect notification emails for messaging, the subject text is HTML encoded which caused e.g. & to be rendered as &.

Screen Shot 2020-01-07 at 10 49 20 AM

fyi @maxceem

maxceem commented 4 years ago

See a demo project: https://connect.topcoder-dev.com/projects/8720

We can have 2 different kind of situations.

Encoding in project name

We store project name already encoded in the DB:

image

  1. This leads to the fact, that subject is encoded twice: once already in DB, and second type before outputting it:

    image

  2. Also, as project name is already encoded, it is shown encoded inside email too:

    image

Notification Kafka event for this case:

{
  "topic": "connect.notification.project.created",
  "originator": "tc-project-service",
  "timestamp": "2020-01-17T09:16:46.679Z",
  "mime-type": "application/json",
  "payload": {
    "projectId": 8720,
    "projectName": "Test & HTML Encoding ",
    "refCode": "We will ask you several questions in order to determine your project’s scope. All estimates are based on our 15 years of experience and thousands of projects. Final prices will be determined after our team completes a final scope review.",
    "projectUrl": "https://connect.topcoder-dev.com/projects/8720",
    "userId": 40152856,
    "initiatorUserId": 40152856
  }
}

Encoding in topics/posts

Unlike project name we don't encode topics/post inside the DB.

  1. So in the subject it got encoded only once:

    image

  2. And inside the email, it is shown correctly

    image

Notification Kafka event for this case:

{
  "topic": "notifications.connect.project.post.created",
  "originator": "tc-message-service",
  "timestamp": "2019-01-17T09:16:46.679Z",
  "mime-type": "application/json",
  "payload": {
    "topicId": 34901,
    "topicTitle": "Test Topic & HTML encoding <MAX>",
    "postContent": "Test Post & HTML encoding <MAX>",
    "userId": "40152856",
    "projectId": "8720",
    "initiatorUserId": "40152856",
    "migration": true,
    "tags": [
      "PRIMARY"
    ]
  }
}

Questions

Actually, in the TC Notification service, we can fix only the second issue with topics, as project name is already coming encoded.

There are several places where we can solve the issue with pre-encoded project name:

  1. The ideal solution I think would be to stop pre-encoding project name at all when we save it the DB. Though, as we already have many existent projects in DB with pre-encoded data, we would have to migrate.

  2. We keep using pre-encoded project name. But when sending events to Kafka for TC Notification Service we can decode project name. As have many events which might send project name, this might require to update code in multiple places.

    • Another disadvantage of this approach, if one day we get rid of special events for TC Notification Service and use the general event which we send for Project ES Processor, we couldn't fix the general events payload because it should have the same data as we have in DB.
  3. Update TC Notification Service to deal with pre-encoded Project name. This feels like not a good solution as Notification Service shouldn't know about such peculiarities. It's better to provide this service with unified data.

@vikasrohit which way would you prefer?

vikasrohit commented 4 years ago

@maxceem do you think the double encoding for the topic title in email's subject is happening because of sendgrid doing the encoding second time? or is it us who are doing the second encoding as well? I would go for storing the data in HTML without encoding. Does that ring any security issue in terms script injection? I am okay with having a script to update all such data (I guess, we only need to update project names which has & in it and should not be difficult to write such a update query).

maxceem commented 4 years ago

@maxceem do you think the double encoding for the topic title in email's subject is happening because of sendgrid doing the encoding second time? or is it us who are doing the second encoding as well?

Looks like SendGrid shouldn't do so https://github.com/sendgrid/nodemailer-sendgrid-transport/issues/18, so I guess we are doing it somewhere ourself. I guess finding this place where subject is encoded would be the main task in this issue.

Does that ring any security issue in terms script injection?

I don't see any security issues, as we are using React which takes care of it.

I am okay with having a script to update all such data (I guess, we only need to update project names which has & in it and should not be difficult to write such a update query).

I've checked it out, looks like we have all the Project fields encoded including JSONs:

image

If I disable this encoding it would look like this:

image

The encoding has been added in this commit https://github.com/topcoder-platform/tc-project-service/commit/da7a3132e0a3c833916160153272e5e137b7ad3a. I'm not sure what was the reason for this.

vikasrohit commented 4 years ago

@maxceem this encoding sanitizing was done because of security testing done by penetration team, I guess. And main purpose is to remove possibility of HTML/Script injection in the system. I am not sure how we can handle it without need of updating all projects in the system. If we go for updating all the projects in the system via a JS code, we have to make sure that we don't update the updatedAt and lastActivityAt fields of the such updated projects otherwise default sorting of the projects might change for users.

vikasrohit commented 4 years ago

Moving to next release.

maxceem commented 4 years ago

This task has two parts:

  1. find the place where we encode the subject of email and stop encoding it
  2. stop encoding projects data, this is a big task, so we would handle it a separate task https://github.com/topcoder-platform/tc-project-service/issues/502. Probably we cannot do this part fast.

As a part of the current issue we have to do only the first part. This would at least fix this issue for the notification about messages.

vikasrohit commented 4 years ago

@maxceem are you working on it yourself?

maxceem commented 4 years ago

I have to either find a way to formulate it properly for the community or fix myself.

maxceem commented 4 years ago

One extra encoding is fixed as per https://github.com/sendgrid/sendgrid-nodejs/issues/741 by replacing configuration of the teplate at sendgrid side from {{subject}} to {{{subject}}}.

One more extra encoding left as per https://github.com/topcoder-platform/tc-project-service/issues/502. This is a big task to do.

vikasrohit commented 4 years ago

@maxceem we can close this issue now, right? I mean we have separate ticket for the second part of the issue.

maxceem commented 4 years ago

Yes @vikasrohit we can close it now. The only question, should we create a copy of the ticket in Project Service inside Connect App repo so we keep it mind? Because when we have tickets in Project Service we keep them like in backlog and pay less attention.

So it depends on the priority.

vikasrohit commented 4 years ago

In that case, lets keep this issue as is.

maxceem commented 4 years ago

Needs some big job done for Project Service, moving to the next milestone.

maxceem commented 4 years ago

@vikasrohit when I get email invitation on DEV, I still get double encoding of the subject:

image

Should this fix https://github.com/appirio-tech/connect-app/issues/3485#issuecomment-612965140 be applied in several places for different emails?

vikasrohit commented 4 years ago

@maxceem I didn't updated the subject in dev template, just did that. Let me know if it fixes.

maxceem commented 4 years ago

Ah, yes, fixed now:

image

Only one extra encoding left.